Optimal Decentralized Control of System with Partially Exchangeable - - PowerPoint PPT Presentation
Optimal Decentralized Control of System with Partially Exchangeable - - PowerPoint PPT Presentation
Optimal Decentralized Control of System with Partially Exchangeable Agents Aditya Mahajan McGill University Joint work with Jalal Arabneydi Allerton Conference on Communication, Control, and Computing 28 Sep, 2016 Decentralized control with
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
1
Optimal decentralized control: Applications
Internet of Things
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
1
Optimal decentralized control: Applications
Internet of Things Smart Grids
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
1
Optimal decentralized control: Applications
Internet of Things Smart Grids Sensor Networks
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
1
Optimal decentralized control: Applications
Internet of Things Smart Grids Sensor Networks Swarm Robotics
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
1
Optimal decentralized control: Applications and Theory
Internet of Things Smart Grids Sensor Networks Swarm Robotics
Salient features
Multiple decision makers Access to difgerent information Cooperate towards a common objective
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
1
Optimal decentralized control: Applications and Theory
Internet of Things Smart Grids Sensor Networks Swarm Robotics
Salient features
Multiple decision makers Access to difgerent information Cooperate towards a common objective
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
1
Optimal decentralized control: Applications and Theory
Internet of Things Smart Grids Sensor Networks Swarm Robotics
Salient features
Multiple decision makers Access to difgerent information Cooperate towards a common objective
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
1
Optimal decentralized control: Applications and Theory
Internet of Things Smart Grids Sensor Networks Swarm Robotics
Salient features
Multiple decision makers Access to difgerent information Cooperate towards a common objective
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
1
Optimal decentralized control: Applications and Theory
Internet of Things Smart Grids Sensor Networks Swarm Robotics
Salient features
Multiple decision makers Access to difgerent information Cooperate towards a common objective
Series of positive results in the last 10-15 years:
funnel causality, quadratic invariance, common information approach, and others.
Explicit solutions are rare and typically exist for systems with two or three agents.
Are there features that are present in the applications but are missing from the theory?
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
2
System with exchangeable agents
Dynamics 𝐲t+1 = ft(𝐲t, 𝐯t, 𝐱t) with per-step cost ct(𝐲t, 𝐯t).
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
2
System with exchangeable agents
Dynamics 𝐲t+1 = ft(𝐲t, 𝐯t, 𝐱t) with per-step cost ct(𝐲t, 𝐯t).
Pair of exchangeable agents
Agents i and j are exchangeable if 𝒴i = 𝒴j, 𝒱i = 𝒱j, 𝒳i = 𝒳j. ft(σij𝐲t, σij𝐯t, σij𝐱t) = σij(ft(𝐲t, 𝐯t, 𝐱t)) ct(σij𝐲t, σij𝐯t) = ct(𝐲t, 𝐯t).
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
2
System with exchangeable agents
Dynamics 𝐲t+1 = ft(𝐲t, 𝐯t, 𝐱t) with per-step cost ct(𝐲t, 𝐯t).
Pair of exchangeable agents
Agents i and j are exchangeable if 𝒴i = 𝒴j, 𝒱i = 𝒱j, 𝒳i = 𝒳j. ft(σij𝐲t, σij𝐯t, σij𝐱t) = σij(ft(𝐲t, 𝐯t, 𝐱t)) ct(σij𝐲t, σij𝐯t) = ct(𝐲t, 𝐯t).
Set of exchangeable agents
A set of agents is exchangeable if every pairin that set is exchangeable
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
2
System with exchangeable agents
Dynamics 𝐲t+1 = ft(𝐲t, 𝐯t, 𝐱t) with per-step cost ct(𝐲t, 𝐯t).
Pair of exchangeable agents
Agents i and j are exchangeable if 𝒴i = 𝒴j, 𝒱i = 𝒱j, 𝒳i = 𝒳j. ft(σij𝐲t, σij𝐯t, σij𝐱t) = σij(ft(𝐲t, 𝐯t, 𝐱t)) ct(σij𝐲t, σij𝐯t) = ct(𝐲t, 𝐯t).
Set of exchangeable agents
A set of agents is exchangeable if every pairin that set is exchangeable
System with partially exchangeable agents
. . . is a multi-agent system where the set of agents can be partitioned into disjoint sets of exchangeable agents.
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
3
Notation
N : number of heterogeneous agents K : number of subpopulations
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
3
Notation
For agent i of sub- population k xi
t ∈ ℝdk
x : state of agent i
ui
t ∈ ℝdk
u : control action of agent i
N : number of heterogeneous agents K : number of subpopulations
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
3
Notation
For agent i of sub- population k xi
t ∈ ℝdk
x : state of agent i
ui
t ∈ ℝdk
u : control action of agent i
For sub-population k 𝒪k : set of agents in sub-popln k ¯ xk
t =
1 |𝒪k| ∑
i∈𝒪k
xi
t : mean-fjeld of states
¯ uk
t =
1 |𝒪k| ∑
i∈𝒪k
ui
t : mean-fjeld of actions
N : number of heterogeneous agents K : number of subpopulations
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
4
Notation
For the entire population 𝒪 = 𝒪1 ∪ ⋅ ⋅ ⋅ ∪ 𝒪K : set of all agents = {1, . . . , K} : set of all sub-populations 𝐲t = (xi
t)i∈𝒪 : global state of the system
𝐯t = (ui
t)i∈𝒪 : joint actions of all agents
¯ 𝐲t = vec(¯ x1
t, . . . , ¯
xK
t )
: global mean-fjeld of states ¯ 𝐯t = vec( ¯ u1
t, . . . , ¯
uK
t ) : global mean-fjeld of actions
N : number of heterogeneous agents K : number of subpopulations
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
5
Linear quadratic system with partially exchangeable agents
Dynamics 𝐲t+1 = At𝐲t + Bt𝐯t + 𝐱t Cost
T
∑
t=1 [𝐲⊺ t Qt𝐲t + 𝐯⊺ t Rt𝐯t]
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
5
Linear quadratic system with partially exchangeable agents
Dynamics 𝐲t+1 = At𝐲t + Bt𝐯t + 𝐱t Cost
T
∑
t=1 [𝐲⊺ t Qt𝐲t + 𝐯⊺ t Rt𝐯t]
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
5
Linear quadratic system with partially exchangeable agents
Dynamics 𝐲t+1 = At𝐲t + Bt𝐯t + 𝐱t Cost
T
∑
t=1 [𝐲⊺ t Qt𝐲t + 𝐯⊺ t Rt𝐯t]
Irrespective of the information structure such a system is equivalent to a mean-fjeld coupled system
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
5
Linear quadratic system with partially exchangeable agents
Dynamics 𝐲t+1 = At𝐲t + Bt𝐯t + 𝐱t Cost
T
∑
t=1 [𝐲⊺ t Qt𝐲t + 𝐯⊺ t Rt𝐯t]
Irrespective of the information structure such a system is equivalent to a mean-fjeld coupled system
Agent dynamics in sub-population k xi
t+1 = Ak t xi t + Bk t ui t + Dk t ¯
𝐲t + Ek
t ¯
𝐯t + wi
t
Cost
T
∑
t=1 [ ∑ k∈
∑
i∈𝒪k
1 |𝒪k|[(xi
t) ⊺Qk t xi t+(ui t) ⊺Rk t ui t]+ ¯
𝐲⊺
t Px t ¯
𝐲t + ¯ 𝐯⊺
t Pu t ¯
𝐯t ]
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
6
There is a long history of mean-field approximations
Mean-field approximation in statistical physics (Weiss 1907; Landau 1937)
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
6
There is a long history of mean-field approximations
Mean-field approximation in statistical physics (Weiss 1907; Landau 1937)
It is a well-known phenomenon in many branches of the exact and physical sciences that very great numbers are often easier to handle than those of medium size. An almost exact theory of a gas, containing about 1025 freely moving particles, is incomparably easier than that of the solar system, made up of 9 major bodies… This is, of course, due to the excellent possibility of applying the laws of statistics and probabilities in the fjrst case. — von Neumann and Morgenstern, Theory of Games and Economic Behavior (1944) §2.4.2
Anonymous games Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
6
There is a long history of mean-field approximations
Mean-field approximation in statistical physics (Weiss 1907; Landau 1937)
. . .
Mean-field approximations in Game Theory
Jovanovic Rosenthal 1988 Bergin Bernhardt 1995 Weintraub Benkard Van Roy 2008 . . .
Anonymous games Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
6
There is a long history of mean-field approximations
Mean-field approximation in statistical physics (Weiss 1907; Landau 1937)
. . .
Mean-field approximations in Game Theory
Jovanovic Rosenthal 1988 Bergin Bernhardt 1995 Weintraub Benkard Van Roy 2008 . . .
Mean-field approximations in Systems and Control (Mean-field games)
Huang Caines Malhalmé 2003, . . . Larsy Lions 2006, . . . . . .
Our results are different There is no approximation! Results are applicable to systems with arbitrary (not necessarily large) number of agents
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
7
Main idea: What happens if mean-field is observed?
Mean-field sharing information structure
Ii
t = {xi 1:t, ui 1:t−1, ¯
𝐲1:t}
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
7
Main idea: What happens if mean-field is observed?
Mean-field sharing information structure
Ii
t = {xi 1:t, ui 1:t−1, ¯
𝐲1:t}
Is it a restrictive assumption?
Not really. Mean-fjeld can be shared using small communication
- verhead (using consensus algorithms)
We later provide approx. results when mean-fjeld is not shared.
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
7
Main idea: What happens if mean-field is observed?
Mean-field sharing information structure
Ii
t = {xi 1:t, ui 1:t−1, ¯
𝐲1:t}
Is it a restrictive assumption?
Not really. Mean-fjeld can be shared using small communication
- verhead (using consensus algorithms)
We later provide approx. results when mean-fjeld is not shared.
Not one of the known tractable information structures
Not partially nested (or stochastically nested) Not quadratic invariant Not partial history sharing
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
8
A surprisingly simple solution . . .
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
8
A surprisingly simple solution . . .
Parallel axis Theorem 1 |𝒪k| ∑
i∈𝒪k
(xi
t)⊺Qk t xi t =
1 |𝒪k| ∑
i∈𝒪k
(˘ xi
t)⊺Qk t ˘
xi
t+(¯
xk
t )⊺Qk t ¯
xk
t ,
where ˘ xi
t = xi t − ¯
xk
t .
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
8
A surprisingly simple solution . . .
Parallel axis Theorem 1 |𝒪k| ∑
i∈𝒪k
(xi
t)⊺Qk t xi t =
1 |𝒪k| ∑
i∈𝒪k
(˘ xi
t)⊺Qk t ˘
xi
t+(¯
xk
t )⊺Qk t ¯
xk
t ,
where ˘ xi
t = xi t − ¯
xk
t .
Decoupled Per-step cost ∑
k∈
∑
i∈𝒪k
1 |𝒪k|[(˘ xi
t) ⊺Qk t ˘
xi
t] + ¯
𝐲⊺
t ( ¯
Qt + Px
t )¯
𝐲t
+ similar u-terms
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
8
A surprisingly simple solution . . .
Parallel axis Theorem 1 |𝒪k| ∑
i∈𝒪k
(xi
t)⊺Qk t xi t =
1 |𝒪k| ∑
i∈𝒪k
(˘ xi
t)⊺Qk t ˘
xi
t+(¯
xk
t )⊺Qk t ¯
xk
t ,
where ˘ xi
t = xi t − ¯
xk
t .
Decoupled Per-step cost ∑
k∈
∑
i∈𝒪k
1 |𝒪k|[(˘ xi
t) ⊺Qk t ˘
xi
t] + ¯
𝐲⊺
t ( ¯
Qt + Px
t )¯
𝐲t
+ similar u-terms
Noise coupled Dynamics ˘ xi
t+1 = Ak t ˘
xi
t + Bk t ˘
ui
t + ˘
wi
t,
¯ 𝐲t+1 = Ak
t ¯
𝐲t + Bk
t ¯
𝐯t + ¯ 𝐱t
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
8
A surprisingly simple solution . . .
Parallel axis Theorem 1 |𝒪k| ∑
i∈𝒪k
(xi
t)⊺Qk t xi t =
1 |𝒪k| ∑
i∈𝒪k
(˘ xi
t)⊺Qk t ˘
xi
t+(¯
xk
t )⊺Qk t ¯
xk
t ,
where ˘ xi
t = xi t − ¯
xk
t .
Decoupled Per-step cost ∑
k∈
∑
i∈𝒪k
1 |𝒪k|[(˘ xi
t) ⊺Qk t ˘
xi
t] + ¯
𝐲⊺
t ( ¯
Qt + Px
t )¯
𝐲t
+ similar u-terms
Noise coupled Dynamics ˘ xi
t+1 = Ak t ˘
xi
t + Bk t ˘
ui
t + ˘
wi
t,
¯ 𝐲t+1 = Ak
t ¯
𝐲t + Bk
t ¯
𝐯t + ¯ 𝐱t We still have a non-classical information structure
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
9
Assume centralized information and use certainty equivalence
Local States Mean-field state Dynamics
˘ xi
t+1 = Ak t ˘
xi
t + Bk t ˘
ui
t + ˘
wi
t
¯ 𝐲t+1 = At ¯ 𝐲t + Bt ¯ 𝐯t + ¯ 𝐱t
Cost
(˘ xi
t) ⊺Qk t ˘
xi
t + ( ˘
ui
t) ⊺Rk t ˘
ui
t
(¯ 𝐲t)
⊺(Px t + Qt)¯
𝐲t + ( ¯ 𝐯t)
⊺(Pu t + Rt) ¯
𝐯t
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
9
Assume centralized information and use certainty equivalence
Local States Mean-field state Dynamics
˘ xi
t+1 = Ak t ˘
xi
t + Bk t ˘
ui
t + ˘
wi
t
¯ 𝐲t+1 = At ¯ 𝐲t + Bt ¯ 𝐯t + ¯ 𝐱t
Cost
(˘ xi
t) ⊺Qk t ˘
xi
t + ( ˘
ui
t) ⊺Rk t ˘
ui
t
(¯ 𝐲t)
⊺(Px t + Qt)¯
𝐲t + ( ¯ 𝐯t)
⊺(Pu t + Rt) ¯
𝐯t
Control Law
˘ ui
t = ˘
Lk
t ˘
xi
t
¯ 𝐯t = ¯ 𝐌t ¯ 𝐲t
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
9
Assume centralized information and use certainty equivalence
Local States Mean-field state Dynamics
˘ xi
t+1 = Ak t ˘
xi
t + Bk t ˘
ui
t + ˘
wi
t
¯ 𝐲t+1 = At ¯ 𝐲t + Bt ¯ 𝐯t + ¯ 𝐱t
Cost
(˘ xi
t) ⊺Qk t ˘
xi
t + ( ˘
ui
t) ⊺Rk t ˘
ui
t
(¯ 𝐲t)
⊺(Px t + Qt)¯
𝐲t + ( ¯ 𝐯t)
⊺(Pu t + Rt) ¯
𝐯t
Control Law
˘ ui
t = ˘
Lk
t ˘
xi
t
¯ 𝐯t = ¯ 𝐌t ¯ 𝐲t
Gains
˘ Lk
t = −( ⋅ ⋅ ⋅ ) −1(Bk t )⊺ ˘
Mk
t+1Ak t
¯ Lt = −( ⋅ ⋅ ⋅ )
−1( ¯
Bt)⊺ ¯ 𝐍t+1 ¯ At
Riccati Equation
˘ Mk
1:T = DRE(Ak 1:T, Bk 1:T, Qk 1:T, Rk 1:T)
¯ M1:T = DRE( ¯ A1:T, ¯ B1:T, ¯ Q1:T + Px
1:T,
¯ R1:T + Pu
1:T)
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
9
Assume centralized information and use certainty equivalence
Local States Mean-field state Dynamics
˘ xi
t+1 = Ak t ˘
xi
t + Bk t ˘
ui
t + ˘
wi
t
¯ 𝐲t+1 = At ¯ 𝐲t + Bt ¯ 𝐯t + ¯ 𝐱t
Cost
(˘ xi
t) ⊺Qk t ˘
xi
t + ( ˘
ui
t) ⊺Rk t ˘
ui
t
(¯ 𝐲t)
⊺(Px t + Qt)¯
𝐲t + ( ¯ 𝐯t)
⊺(Pu t + Rt) ¯
𝐯t
Control Law
˘ ui
t = ˘
Lk
t ˘
xi
t
¯ 𝐯t = ¯ 𝐌t ¯ 𝐲t
Gains
˘ Lk
t = −( ⋅ ⋅ ⋅ ) −1(Bk t )⊺ ˘
Mk
t+1Ak t
¯ Lt = −( ⋅ ⋅ ⋅ )
−1( ¯
Bt)⊺ ¯ 𝐍t+1 ¯ At
Riccati Equation
˘ Mk
1:T = DRE(Ak 1:T, Bk 1:T, Qk 1:T, Rk 1:T)
¯ M1:T = DRE( ¯ A1:T, ¯ B1:T, ¯ Q1:T + Px
1:T,
¯ R1:T + Pu
1:T)
K equations, one for each sub-population 1 equation for all mean-fjelds
ui
t = ˘
ui
t + ¯
uk
t = ˘
Lk
t (xi t − ¯
xk
t ) + ¯
Lk
t ¯
𝐲t Optimal centralized solution can be implemented with mean-field sharing information structure.
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
10
Solution generalizes to . . .
Major-minor setup One major agent and a population of minor agents. Tracking cost function ∑
k∈
∑
i∈𝒪k
1 |𝒪k|[(xi
t − ˚
xi
t) ⊺Qk t (xi t − ˚
xi
t) + (ui t)⊺Rk t ui t]
+ (¯ 𝐲t − rt)⊺Px
t (¯
𝐲t − rt) + ¯ 𝐯⊺
t Pu t ¯
𝐯t Systems coupled through weighted mean-field ¯ xk
t =
1 |𝒪k| ∑
i∈𝒪k
λixi
t,
¯ uk
t =
1 |𝒪k| ∑
i∈𝒪k
λiui
t.
But what if the mean-field is not observed?
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
11
Partial mean-field sharing information structure
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
11
Partial mean-field sharing information structure
Notation We will compare performance with system where mean- fjeld is completely observed. To avoid confusion, use State: si
t;
Actions: vi
t.
and similar notation for mean-fjeld ¯ sk
t , etc.
Set 𝒯: MF observed Set 𝒯c: MF not observed
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
11
Partial mean-field sharing information structure
Estimated mean-field
𝐴t = (z1
t, . . . , zK t ) = 𝔽[¯
𝐭t | {¯ sk
t }k∈𝒪],
where zk
t+1 =
{ ¯ sk
t+1,
k ∈ 𝒯 Ak
t zk t + (Bk t ¯
Lk
t + Dk t + Ek t ¯
Lt)𝐴t, k ∉ 𝒯 Set 𝒯: MF observed Set 𝒯c: MF not observed
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
12
Certainty equivalence controller and its performance
Certainty equivalence controller
ui
t = ˘
Lk
t (si t − zk t ) + ¯
Lk
t 𝐴t
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
12
Certainty equivalence controller and its performance
Certainty equivalence controller
ui
t = ˘
Lk
t (si t − zk t ) + ¯
Lk
t 𝐴t
Key Lemma Under the certainty equivalence control: ˘ si
t = ˘
xi
t.
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
12
Certainty equivalence controller and its performance
Certainty equivalence controller
ui
t = ˘
Lk
t (si t − zk t ) + ¯
Lk
t 𝐴t
Key Lemma Under the certainty equivalence control: ˘ si
t = ˘
xi
- t. Thus,
ˆ J − J∗ = 𝔽 [
T
∑
t=1
[¯ 𝐭⊺
t ˆ
Qt ¯ 𝐭t + ¯ 𝐰⊺
t ˆ
Rt ¯ 𝐰t − ¯ 𝐲⊺
t ˆ
Qt ¯ 𝐲t − ¯ 𝐯⊺
t ˆ
Rt ¯ 𝐯t]]
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
12
Certainty equivalence controller and its performance
Certainty equivalence controller
ui
t = ˘
Lk
t (si t − zk t ) + ¯
Lk
t 𝐴t
Key Lemma Under the certainty equivalence control: ˘ si
t = ˘
xi
- t. Thus,
ˆ J − J∗ = 𝔽 [
T
∑
t=1
[¯ 𝐭⊺
t ˆ
Qt ¯ 𝐭t + ¯ 𝐰⊺
t ˆ
Rt ¯ 𝐰t − ¯ 𝐲⊺
t ˆ
Qt ¯ 𝐲t − ¯ 𝐯⊺
t ˆ
Rt ¯ 𝐯t]] = 𝔽 [
T
∑
t=1 [ ζt
ξt ] ̃ Q [ ζt ξt ]], where ζk
t = ¯
xk
t − zk t and ξk t = ¯
sk
t − zk t
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
12
Certainty equivalence controller and its performance
Certainty equivalence controller
ui
t = ˘
Lk
t (si t − zk t ) + ¯
Lk
t 𝐴t
Key Lemma Under the certainty equivalence control: ˘ si
t = ˘
xi
- t. Thus,
ˆ J − J∗ = 𝔽 [
T
∑
t=1
[¯ 𝐭⊺
t ˆ
Qt ¯ 𝐭t + ¯ 𝐰⊺
t ˆ
Rt ¯ 𝐰t − ¯ 𝐲⊺
t ˆ
Qt ¯ 𝐲t − ¯ 𝐯⊺
t ˆ
Rt ¯ 𝐯t]] = 𝔽 [
T
∑
t=1 [ ζt
ξt ] ̃ Q [ ζt ξt ]], where ζk
t = ¯
xk
t − zk t and ξk t = ¯
sk
t − zk t
Moreover, [ ζt+1 ξt+1 ] = ˜ At [ ζt ξt ] + [ h ∘ ¯ 𝐱t h ∘ ¯ 𝐱t ]
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
12
Certainty equivalence controller and its performance
Certainty equivalence controller
ui
t = ˘
Lk
t (si t − zk t ) + ¯
Lk
t 𝐴t
Key Lemma Under the certainty equivalence control: ˘ si
t = ˘
xi
- t. Thus,
ˆ J − J∗ = 𝔽 [
T
∑
t=1
[¯ 𝐭⊺
t ˆ
Qt ¯ 𝐭t + ¯ 𝐰⊺
t ˆ
Rt ¯ 𝐰t − ¯ 𝐲⊺
t ˆ
Qt ¯ 𝐲t − ¯ 𝐯⊺
t ˆ
Rt ¯ 𝐯t]] Quadratic Cost = 𝔽 [
T
∑
t=1 [ ζt
ξt ] ̃ Q [ ζt ξt ]], where ζk
t = ¯
xk
t − zk t and ξk t = ¯
sk
t − zk t
Moreover, [ ζt+1 ξt+1 ] = ˜ At [ ζt ξt ] + [ h ∘ ¯ 𝐱t h ∘ ¯ 𝐱t ] Linear Dynamics
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
13
Certainty equivalence controller and its performance
Exact Performance
ˆ J−J∗ = Tr( ̃ X1 ̃ M1)+
T−1
∑
t=1
Tr( ̃ Wt ̃ Mt+1) where ̃ M1:T = DLE( ̃ A1:T, ̃ Q1:T)
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
13
Certainty equivalence controller and its performance
Exact Performance
ˆ J−J∗ = Tr( ̃ X1 ̃ M1)+
T−1
∑
t=1
Tr( ̃ Wt ̃ Mt+1) where ̃ M1:T = DLE( ̃ A1:T, ̃ Q1:T)
Performance bound
Let n = mink∉S{|𝒪k|}. Suppose all noises are independent. Then, there exists a matrix C such that ̃ X1 ≤ C/n and ̃ Wt ≤ C/n. Thus,
ˆ J − J∗ ∈ 𝒫 ( T n) ,
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
13
Certainty equivalence controller and its performance
Exact Performance
ˆ J−J∗ = Tr( ̃ X1 ̃ M1)+
T−1
∑
t=1
Tr( ̃ Wt ̃ Mt+1) where ̃ M1:T = DLE( ̃ A1:T, ̃ Q1:T)
Performance bound
Let n = mink∉S{|𝒪k|}. Suppose all noises are independent. Then, there exists a matrix C such that ̃ X1 ≤ C/n and ̃ Wt ≤ C/n. Thus,
ˆ J − J∗ ∈ 𝒫 ( T n) , Infinite horizon
Results extend to infjnite horizon setup understandard assumptions. For both discounted and average cost setup:
ˆ J − J∗ ∈ 𝒫 ( 1 n)
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
13
Certainty equivalence controller and its performance
Exact Performance
ˆ J−J∗ = Tr( ̃ X1 ̃ M1)+
T−1
∑
t=1
Tr( ̃ Wt ̃ Mt+1) where ̃ M1:T = DLE( ̃ A1:T, ̃ Q1:T)
Performance bound
Let n = mink∉S{|𝒪k|}. Suppose all noises are independent. Then, there exists a matrix C such that ̃ X1 ≤ C/n and ̃ Wt ≤ C/n. Thus,
ˆ J − J∗ ∈ 𝒫 ( T n) , Infinite horizon
Results extend to infjnite horizon setup understandard assumptions. For both discounted and average cost setup:
ˆ J − J∗ ∈ 𝒫 ( 1 n) , c.f. 𝒫 ( 1 √n) in MFG
An example: Demand response with minimum discomfort to users
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
14
Demand response of space heaters
Dynamics of space heater xi
t+1 = a(xi t − xnom) + b(ui t + unom) + wi t
Objective 𝔽 [ 1 n
T
∑
t=1 n
∑
i=1 [qt(xi t − xi des)2 + rt(ui t)2
] + pt(¯ 𝐲t − ¯ 𝐲ref
t )2
]
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
15
Everyone follows the mean-field
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
16
Everyone follows the optimal strategy
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
17
Summary
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
2
System with exchangeable agents
Dynamics 𝐲t+1 = ft(𝐲t, 𝐯t, 𝐱t) with per-step cost ct(𝐲t, 𝐯t).
Pair of exchangeable agents
Agents i and j are exchangeable if 𝒴i = 𝒴j, 𝒱i = 𝒱j, 𝒳i = 𝒳j. ft(σij𝐲t, σij𝐯t, σij𝐱t) = σij(ft(𝐲t, 𝐯t, 𝐱t)) ct(σij𝐲t, σij𝐯t) = ct(𝐲t, 𝐯t).
Set of exchangeable agents
A set of agents is exchangeable if every pairin that set is exchangeable
System with partially exchangeable agents
. . . is a multi-agent system where the set of agents can be partitioned into disjoint sets of exchangeable agents.
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
17
Summary
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
2
System with exchangeable agents
Dynamics 𝐲t+1 = ft(𝐲t, 𝐯t, 𝐱t) with per-step cost ct(𝐲t, 𝐯t).
Pair of exchangeable agents
Agents i and j are exchangeable if 𝒴i = 𝒴j, 𝒱i = 𝒱j, 𝒳i = 𝒳j. ft(σij𝐲t, σij𝐯t, σij𝐱t) = σij(ft(𝐲t, 𝐯t, 𝐱t)) ct(σij𝐲t, σij𝐯t) = ct(𝐲t, 𝐯t).
Set of exchangeable agents
A set of agents is exchangeable if every pairin that set is exchangeable
System with partially exchangeable agents
. . . is a multi-agent system where the set of agents can be partitioned into disjoint sets of exchangeable agents. Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
5
Linear quadratic system with partially exchangeable agents
Dynamics 𝐲t+1 = At𝐲t + Bt𝐯t + 𝐱t Cost
T
∑
t=1 [𝐲⊺ t Qt𝐲t + 𝐯⊺ t Rt𝐯t]
Irrespective of the information structure such a system is equivalent to a mean-fjeld coupled system
Agent dynamics in sub-population k xi
t+1 = Ak t xi t + Bk t ui t + Dk t ¯
𝐲t + Ek
t ¯
𝐯t + wi
t
Cost
T
∑
t=1 [ ∑ k∈
∑
i∈𝒪k
1 |𝒪k|[(xi
t) ⊺Qk t xi t+(ui t) ⊺Rk t ui t]+ ¯
𝐲⊺
t Px t ¯
𝐲t + ¯ 𝐯⊺
t Pu t ¯
𝐯t ]
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
17
Summary
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
2
System with exchangeable agents
Dynamics 𝐲t+1 = ft(𝐲t, 𝐯t, 𝐱t) with per-step cost ct(𝐲t, 𝐯t).
Pair of exchangeable agents
Agents i and j are exchangeable if 𝒴i = 𝒴j, 𝒱i = 𝒱j, 𝒳i = 𝒳j. ft(σij𝐲t, σij𝐯t, σij𝐱t) = σij(ft(𝐲t, 𝐯t, 𝐱t)) ct(σij𝐲t, σij𝐯t) = ct(𝐲t, 𝐯t).
Set of exchangeable agents
A set of agents is exchangeable if every pairin that set is exchangeable
System with partially exchangeable agents
. . . is a multi-agent system where the set of agents can be partitioned into disjoint sets of exchangeable agents. Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
5
Linear quadratic system with partially exchangeable agents
Dynamics 𝐲t+1 = At𝐲t + Bt𝐯t + 𝐱t Cost
T
∑
t=1 [𝐲⊺ t Qt𝐲t + 𝐯⊺ t Rt𝐯t]
Irrespective of the information structure such a system is equivalent to a mean-fjeld coupled system
Agent dynamics in sub-population k xi
t+1 = Ak t xi t + Bk t ui t + Dk t ¯
𝐲t + Ek
t ¯
𝐯t + wi
t
Cost
T
∑
t=1 [ ∑ k∈
∑
i∈𝒪k
1 |𝒪k|[(xi
t) ⊺Qk t xi t+(ui t) ⊺Rk t ui t]+ ¯
𝐲⊺
t Px t ¯
𝐲t + ¯ 𝐯⊺
t Pu t ¯
𝐯t ] Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
9
Assume centralized information and use certainty equivalence
Local States Mean-field state Dynamics
˘ xi
t+1 = Ak t ˘
xi
t + Bk t ˘
ui
t + ˘
wi
t
¯ 𝐲t+1 = At ¯ 𝐲t + Bt ¯ 𝐯t + ¯ 𝐱t
Cost
(˘ xi
t) ⊺Qk t ˘
xi
t + ( ˘
ui
t) ⊺Rk t ˘
ui
t
(¯ 𝐲t)
⊺(Px t + Qt)¯
𝐲t + ( ¯ 𝐯t)
⊺(Pu t + Rt) ¯
𝐯t
Control Law
˘ ui
t = ˘
Lk
t ˘
xi
t
¯ 𝐯t = ¯ 𝐌t ¯ 𝐲t
Gains
˘ Lk
t = −( ⋅ ⋅ ⋅ ) −1(Bk t )⊺ ˘
Mk
t+1Ak t
¯ Lt = −( ⋅ ⋅ ⋅ )
−1( ¯
Bt)⊺ ¯ 𝐍t+1 ¯ At
Riccati Equation
˘ Mk
1:T = DRE(Ak 1:T, Bk 1:T, Qk 1:T, Rk 1:T)
¯ M1:T = DRE( ¯ A1:T, ¯ B1:T, ¯ Q1:T + Px
1:T,
¯ R1:T + Pu
1:T)
K equations, one for each sub-population 1 equation for all mean-fjelds
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
17
Summary
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
2
System with exchangeable agents
Dynamics 𝐲t+1 = ft(𝐲t, 𝐯t, 𝐱t) with per-step cost ct(𝐲t, 𝐯t).
Pair of exchangeable agents
Agents i and j are exchangeable if 𝒴i = 𝒴j, 𝒱i = 𝒱j, 𝒳i = 𝒳j. ft(σij𝐲t, σij𝐯t, σij𝐱t) = σij(ft(𝐲t, 𝐯t, 𝐱t)) ct(σij𝐲t, σij𝐯t) = ct(𝐲t, 𝐯t).
Set of exchangeable agents
A set of agents is exchangeable if every pairin that set is exchangeable
System with partially exchangeable agents
. . . is a multi-agent system where the set of agents can be partitioned into disjoint sets of exchangeable agents. Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
5
Linear quadratic system with partially exchangeable agents
Dynamics 𝐲t+1 = At𝐲t + Bt𝐯t + 𝐱t Cost
T
∑
t=1 [𝐲⊺ t Qt𝐲t + 𝐯⊺ t Rt𝐯t]
Irrespective of the information structure such a system is equivalent to a mean-fjeld coupled system
Agent dynamics in sub-population k xi
t+1 = Ak t xi t + Bk t ui t + Dk t ¯
𝐲t + Ek
t ¯
𝐯t + wi
t
Cost
T
∑
t=1 [ ∑ k∈
∑
i∈𝒪k
1 |𝒪k|[(xi
t) ⊺Qk t xi t+(ui t) ⊺Rk t ui t]+ ¯
𝐲⊺
t Px t ¯
𝐲t + ¯ 𝐯⊺
t Pu t ¯
𝐯t ] Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
9
Assume centralized information and use certainty equivalence
Local States Mean-field state Dynamics
˘ xi
t+1 = Ak t ˘
xi
t + Bk t ˘
ui
t + ˘
wi
t
¯ 𝐲t+1 = At ¯ 𝐲t + Bt ¯ 𝐯t + ¯ 𝐱t
Cost
(˘ xi
t) ⊺Qk t ˘
xi
t + ( ˘
ui
t) ⊺Rk t ˘
ui
t
(¯ 𝐲t)
⊺(Px t + Qt)¯
𝐲t + ( ¯ 𝐯t)
⊺(Pu t + Rt) ¯
𝐯t
Control Law
˘ ui
t = ˘
Lk
t ˘
xi
t
¯ 𝐯t = ¯ 𝐌t ¯ 𝐲t
Gains
˘ Lk
t = −( ⋅ ⋅ ⋅ ) −1(Bk t )⊺ ˘
Mk
t+1Ak t
¯ Lt = −( ⋅ ⋅ ⋅ )
−1( ¯
Bt)⊺ ¯ 𝐍t+1 ¯ At
Riccati Equation
˘ Mk
1:T = DRE(Ak 1:T, Bk 1:T, Qk 1:T, Rk 1:T)
¯ M1:T = DRE( ¯ A1:T, ¯ B1:T, ¯ Q1:T + Px
1:T,
¯ R1:T + Pu
1:T)
K equations, one for each sub-population 1 equation for all mean-fjelds Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
11
Partial mean-field sharing information structure
Estimated mean-field
𝐴t = (z1
t, . . . , zK t ) = 𝔽[¯
𝐭t | {¯ sk
t }k∈𝒪],
where zk
t+1 =
{ ¯ sk
t+1,
k ∈ 𝒯 Ak
t zk t + (Bk t ¯
Lk
t + Dk t + Ek t ¯
Lt)𝐴t, k ∉ 𝒯 Set 𝒯: MF observed Set 𝒯c: MF not observed
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
17
Summary
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
2
System with exchangeable agents
Dynamics 𝐲t+1 = ft(𝐲t, 𝐯t, 𝐱t) with per-step cost ct(𝐲t, 𝐯t).
Pair of exchangeable agents
Agents i and j are exchangeable if 𝒴i = 𝒴j, 𝒱i = 𝒱j, 𝒳i = 𝒳j. ft(σij𝐲t, σij𝐯t, σij𝐱t) = σij(ft(𝐲t, 𝐯t, 𝐱t)) ct(σij𝐲t, σij𝐯t) = ct(𝐲t, 𝐯t).
Set of exchangeable agents
A set of agents is exchangeable if every pairin that set is exchangeable
System with partially exchangeable agents
. . . is a multi-agent system where the set of agents can be partitioned into disjoint sets of exchangeable agents. Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
5
Linear quadratic system with partially exchangeable agents
Dynamics 𝐲t+1 = At𝐲t + Bt𝐯t + 𝐱t Cost
T
∑
t=1 [𝐲⊺ t Qt𝐲t + 𝐯⊺ t Rt𝐯t]
Irrespective of the information structure such a system is equivalent to a mean-fjeld coupled system
Agent dynamics in sub-population k xi
t+1 = Ak t xi t + Bk t ui t + Dk t ¯
𝐲t + Ek
t ¯
𝐯t + wi
t
Cost
T
∑
t=1 [ ∑ k∈
∑
i∈𝒪k
1 |𝒪k|[(xi
t) ⊺Qk t xi t+(ui t) ⊺Rk t ui t]+ ¯
𝐲⊺
t Px t ¯
𝐲t + ¯ 𝐯⊺
t Pu t ¯
𝐯t ] Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
9
Assume centralized information and use certainty equivalence
Local States Mean-field state Dynamics
˘ xi
t+1 = Ak t ˘
xi
t + Bk t ˘
ui
t + ˘
wi
t
¯ 𝐲t+1 = At ¯ 𝐲t + Bt ¯ 𝐯t + ¯ 𝐱t
Cost
(˘ xi
t) ⊺Qk t ˘
xi
t + ( ˘
ui
t) ⊺Rk t ˘
ui
t
(¯ 𝐲t)
⊺(Px t + Qt)¯
𝐲t + ( ¯ 𝐯t)
⊺(Pu t + Rt) ¯
𝐯t
Control Law
˘ ui
t = ˘
Lk
t ˘
xi
t
¯ 𝐯t = ¯ 𝐌t ¯ 𝐲t
Gains
˘ Lk
t = −( ⋅ ⋅ ⋅ ) −1(Bk t )⊺ ˘
Mk
t+1Ak t
¯ Lt = −( ⋅ ⋅ ⋅ )
−1( ¯
Bt)⊺ ¯ 𝐍t+1 ¯ At
Riccati Equation
˘ Mk
1:T = DRE(Ak 1:T, Bk 1:T, Qk 1:T, Rk 1:T)
¯ M1:T = DRE( ¯ A1:T, ¯ B1:T, ¯ Q1:T + Px
1:T,
¯ R1:T + Pu
1:T)
K equations, one for each sub-population 1 equation for all mean-fjelds Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
11
Partial mean-field sharing information structure
Estimated mean-field
𝐴t = (z1
t, . . . , zK t ) = 𝔽[¯
𝐭t | {¯ sk
t }k∈𝒪],
where zk
t+1 =
{ ¯ sk
t+1,
k ∈ 𝒯 Ak
t zk t + (Bk t ¯
Lk
t + Dk t + Ek t ¯
Lt)𝐴t, k ∉ 𝒯 Set 𝒯: MF observed Set 𝒯c: MF not observed
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
13
Certainty equivalence controller and its performance
Exact Performance
ˆ J−J∗ = Tr( ̃ X1 ̃ M1)+
T−1
∑
t=1
Tr( ̃ Wt ̃ Mt+1) where ̃ M1:T = DLE( ̃ A1:T, ̃ Q1:T)
Performance bound
Let n = mink∉S{|𝒪k|}. Suppose all noises are independent. Then, there exists a matrix C such that ̃ X1 ≤ C/n and ̃ Wt ≤ C/n. Thus,
ˆ J − J∗ ∈ 𝒫 ( T n) , Infinite horizon
Results extend to infjnite horizon setup understandard assumptions. For both discounted and average cost setup:
ˆ J − J∗ ∈ 𝒫 ( 1 n) , c.f. 𝒫 ( 1 √n) in MFG
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
17
Summary
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
2
System with exchangeable agents
Dynamics 𝐲t+1 = ft(𝐲t, 𝐯t, 𝐱t) with per-step cost ct(𝐲t, 𝐯t).
Pair of exchangeable agents
Agents i and j are exchangeable if 𝒴i = 𝒴j, 𝒱i = 𝒱j, 𝒳i = 𝒳j. ft(σij𝐲t, σij𝐯t, σij𝐱t) = σij(ft(𝐲t, 𝐯t, 𝐱t)) ct(σij𝐲t, σij𝐯t) = ct(𝐲t, 𝐯t).
Set of exchangeable agents
A set of agents is exchangeable if every pairin that set is exchangeable
System with partially exchangeable agents
. . . is a multi-agent system where the set of agents can be partitioned into disjoint sets of exchangeable agents. Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
5
Linear quadratic system with partially exchangeable agents
Dynamics 𝐲t+1 = At𝐲t + Bt𝐯t + 𝐱t Cost
T
∑
t=1 [𝐲⊺ t Qt𝐲t + 𝐯⊺ t Rt𝐯t]
Irrespective of the information structure such a system is equivalent to a mean-fjeld coupled system
Agent dynamics in sub-population k xi
t+1 = Ak t xi t + Bk t ui t + Dk t ¯
𝐲t + Ek
t ¯
𝐯t + wi
t
Cost
T
∑
t=1 [ ∑ k∈
∑
i∈𝒪k
1 |𝒪k|[(xi
t) ⊺Qk t xi t+(ui t) ⊺Rk t ui t]+ ¯
𝐲⊺
t Px t ¯
𝐲t + ¯ 𝐯⊺
t Pu t ¯
𝐯t ] Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
9
Assume centralized information and use certainty equivalence
Local States Mean-field state Dynamics
˘ xi
t+1 = Ak t ˘
xi
t + Bk t ˘
ui
t + ˘
wi
t
¯ 𝐲t+1 = At ¯ 𝐲t + Bt ¯ 𝐯t + ¯ 𝐱t
Cost
(˘ xi
t) ⊺Qk t ˘
xi
t + ( ˘
ui
t) ⊺Rk t ˘
ui
t
(¯ 𝐲t)
⊺(Px t + Qt)¯
𝐲t + ( ¯ 𝐯t)
⊺(Pu t + Rt) ¯
𝐯t
Control Law
˘ ui
t = ˘
Lk
t ˘
xi
t
¯ 𝐯t = ¯ 𝐌t ¯ 𝐲t
Gains
˘ Lk
t = −( ⋅ ⋅ ⋅ ) −1(Bk t )⊺ ˘
Mk
t+1Ak t
¯ Lt = −( ⋅ ⋅ ⋅ )
−1( ¯
Bt)⊺ ¯ 𝐍t+1 ¯ At
Riccati Equation
˘ Mk
1:T = DRE(Ak 1:T, Bk 1:T, Qk 1:T, Rk 1:T)
¯ M1:T = DRE( ¯ A1:T, ¯ B1:T, ¯ Q1:T + Px
1:T,
¯ R1:T + Pu
1:T)
K equations, one for each sub-population 1 equation for all mean-fjelds Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
11
Partial mean-field sharing information structure
Estimated mean-field
𝐴t = (z1
t, . . . , zK t ) = 𝔽[¯
𝐭t | {¯ sk
t }k∈𝒪],
where zk
t+1 =
{ ¯ sk
t+1,
k ∈ 𝒯 Ak
t zk t + (Bk t ¯
Lk
t + Dk t + Ek t ¯
Lt)𝐴t, k ∉ 𝒯 Set 𝒯: MF observed Set 𝒯c: MF not observed
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
13
Certainty equivalence controller and its performance
Exact Performance
ˆ J−J∗ = Tr( ̃ X1 ̃ M1)+
T−1
∑
t=1
Tr( ̃ Wt ̃ Mt+1) where ̃ M1:T = DLE( ̃ A1:T, ̃ Q1:T)
Performance bound
Let n = mink∉S{|𝒪k|}. Suppose all noises are independent. Then, there exists a matrix C such that ̃ X1 ≤ C/n and ̃ Wt ≤ C/n. Thus,
ˆ J − J∗ ∈ 𝒫 ( T n) , Infinite horizon
Results extend to infjnite horizon setup understandard assumptions. For both discounted and average cost setup:
ˆ J − J∗ ∈ 𝒫 ( 1 n) , c.f. 𝒫 ( 1 √n) in MFG
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
10
Solution generalizes to . . .
Major-minor setup One major agent and a population of minor agents. Tracking cost function ∑
k∈
∑
i∈𝒪k
1 |𝒪k|[(xi
t − ˚
xi
t) ⊺Qk t (xi t − ˚
xi
t) + (ui t)⊺Rk t ui t]
+ (¯ 𝐲t − rt)⊺Px
t (¯
𝐲t − rt) + ¯ 𝐯⊺
t Pu t ¯
𝐯t Systems coupled through weighted mean-field ¯ xk
t =
1 |𝒪k| ∑
i∈𝒪k
λixi
t,
¯ uk
t =
1 |𝒪k| ∑
i∈𝒪k
λiui
t.
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
17
Conclusion
Salient Features
The solution complexity depends only on the number of sub-populations; not on the number of agents. Agents don’t need to be aware of the number of agents. Same performance as centralized information. Thus, centralized performance can be achieved by simply sharing the mean-fjeld (empirical mean) of the states!
Generalizations
Noisy observation of mean-fjeld Delay in the observation of mean-fjeld Controlled Markov processes
arXiv:1609.00056
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
1
Optimal decentralized control: Applications and Theory
Internet of Things Smart Grids Sensor Networks Swarm Robotics
Salient features
Multiple decision makers Access to difgerent information Cooperate towards a common objective
Series of positive results in the last 10-15 years:
funnel causality, quadratic invariance, common information approach, and others.
Explicit solutions are rare and typically exist for systems with two or three agents.
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
2
System with exchangeable agents
Dynamics 𝐲t+1 = ft(𝐲t, 𝐯t, 𝐱t) with per-step cost ct(𝐲t, 𝐯t).
Pair of exchangeable agents
Agents i and j are exchangeable if 𝒴i = 𝒴j, 𝒱i = 𝒱j, 𝒳i = 𝒳j. ft(σij𝐲t, σij𝐯t, σij𝐱t) = σij(ft(𝐲t, 𝐯t, 𝐱t)) ct(σij𝐲t, σij𝐯t) = ct(𝐲t, 𝐯t).
Set of exchangeable agents
A set of agents is exchangeable if every pairin that set is exchangeable
System with partially exchangeable agents
. . . is a multi-agent system where the set of agents can be partitioned into disjoint sets of exchangeable agents. Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
5
Linear quadratic system with partially exchangeable agents
Dynamics 𝐲t+1 = At𝐲t + Bt𝐯t + 𝐱t Cost
T
∑
t=1 [𝐲⊺ t Qt𝐲t + 𝐯⊺ t Rt𝐯t]
Irrespective of the information structure such a system is equivalent to a mean-fjeld coupled system
Agent dynamics in sub-population k xi
t+1 = Ak t xi t + Bk t ui t + Dk t ¯
𝐲t + Ek
t ¯
𝐯t + wi
t
Cost
T
∑
t=1 [ ∑ k∈
∑
i∈𝒪k
1 |𝒪k|[(xi
t) ⊺Qk t xi t+(ui t) ⊺Rk t ui t]+ ¯
𝐲⊺
t Px t ¯
𝐲t + ¯ 𝐯⊺
t Pu t ¯
𝐯t ] Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
8
A surprisingly simple solution . . .
Parallel axis Theorem 1 |𝒪k| ∑
i∈𝒪k
(xi
t)⊺Qk t xi t =
1 |𝒪k| ∑
i∈𝒪k
(˘ xi
t)⊺Qk t ˘
xi
t+(¯
xk
t )⊺Qk t ¯
xk
t ,
where ˘ xi
t = xi t − ¯
xk
t .
Decoupled Per-step cost ∑
k∈
∑
i∈𝒪k
1 |𝒪k|[(˘ xi
t) ⊺Qk t ˘
xi
t] + ¯
𝐲⊺
t ( ¯
Qt + Px
t )¯
𝐲t
+ similar u-terms
Noise coupled Dynamics ˘ xi
t+1 = Ak t ˘
xi
t + Bk t ˘
ui
t + ˘
wi
t,
¯ 𝐲t+1 = Ak
t ¯
𝐲t + Bk
t ¯
𝐯t + ¯ 𝐱t We still have a non-classical information structure
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
9
Assume centralized information and use certainty equivalence
Local States Mean-field state Dynamics
˘ xi
t+1 = Ak t ˘
xi
t + Bk t ˘
ui
t + ˘
wi
t
¯ 𝐲t+1 = At ¯ 𝐲t + Bt ¯ 𝐯t + ¯ 𝐱t
Cost
(˘ xi
t) ⊺Qk t ˘
xi
t + ( ˘
ui
t) ⊺Rk t ˘
ui
t
(¯ 𝐲t)
⊺(Px t + Qt)¯
𝐲t + ( ¯ 𝐯t)
⊺(Pu t + Rt) ¯
𝐯t
Control Law
˘ ui
t = ˘
Lk
t ˘
xi
t
¯ 𝐯t = ¯ 𝐌t ¯ 𝐲t
Gains
˘ Lk
t = −( ⋅ ⋅ ⋅ ) −1(Bk t )⊺ ˘
Mk
t+1Ak t
¯ Lt = −( ⋅ ⋅ ⋅ )
−1( ¯
Bt)⊺ ¯ 𝐍t+1 ¯ At
Riccati Equation
˘ Mk
1:T = DRE(Ak 1:T, Bk 1:T, Qk 1:T, Rk 1:T)
¯ M1:T = DRE( ¯ A1:T, ¯ B1:T, ¯ Q1:T + Px
1:T,
¯ R1:T + Pu
1:T)
K equations, one for each sub-population 1 equation for all mean-fjelds Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
10
Solution generalizes to . . .
Major-minor setup One major agent and a population of minor agents. Tracking cost function ∑
k∈
∑
i∈𝒪k
1 |𝒪k|[(xi
t − ˚
xi
t) ⊺Qk t (xi t − ˚
xi
t) + (ui t)⊺Rk t ui t]
+ (¯ 𝐲t − rt)⊺Px
t (¯
𝐲t − rt) + ¯ 𝐯⊺
t Pu t ¯
𝐯t Systems coupled through weighted mean-field ¯ xk
t =
1 |𝒪k| ∑
i∈𝒪k
λixi
t,
¯ uk
t =
1 |𝒪k| ∑
i∈𝒪k
λiui
t.
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
11
Partial mean-field sharing information structure
Estimated mean-field
𝐴t = (z1
t, . . . , zK t ) = 𝔽[¯
𝐭t | {¯ sk
t }k∈𝒪],
where zk
t+1 =
{ ¯ sk
t+1,
k ∈ 𝒯 Ak
t zk t + (Bk t ¯
Lk
t + Dk t + Ek t ¯
Lt)𝐴t, k ∉ 𝒯 Set 𝒯: MF observed Set 𝒯c: MF not observed
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
13
Certainty equivalence controller and its performance
Exact Performance
ˆ J−J∗ = Tr( ̃ X1 ̃ M1)+
T−1
∑
t=1
Tr( ̃ Wt ̃ Mt+1) where ̃ M1:T = DLE( ̃ A1:T, ̃ Q1:T)
Performance bound
Let n = mink∉S{|𝒪k|}. Suppose all noises are independent. Then, there exists a matrix C such that ̃ X1 ≤ C/n and ̃ Wt ≤ C/n. Thus,
ˆ J − J∗ ∈ 𝒫 ( T n) , Infinite horizon
Results extend to infjnite horizon setup understandard assumptions. For both discounted and average cost setup:
ˆ J − J∗ ∈ 𝒫 ( 1 n) , c.f. 𝒫 ( 1 √n) in MFG
Decentralized control with exchangeable agents–(Arabneydi and Mahajan)
16
Everyone follows the optimal strategy