Ovidiu Iacoboaiea†‡, Berna Sayrac†, Sana Ben Jemaa†, Pascal Bianchi‡
(†)Orange Labs, 38-40 rue du General Leclerc 92130, Issy les Moulineaux, France (‡) Telecom ParisTech, 37 rue Dareau 75014, Paris, France
Learning with State Aggregation Ovidiu Iacoboaiea , Berna Sayrac , - - PowerPoint PPT Presentation
SON Conflict Resolution using Reinforcement Learning with State Aggregation Ovidiu Iacoboaiea , Berna Sayrac , Sana Ben Jemaa , Pascal Bianchi ( )Orange Labs, 38-40 rue du General Leclerc 92130, Issy les Moulineaux, France (
(†)Orange Labs, 38-40 rue du General Leclerc 92130, Issy les Moulineaux, France (‡) Telecom ParisTech, 37 rue Dareau 75014, Paris, France
2
(e.g. MRO instance)
(e.g. MLB instance)
3
4
We consider:
– each of which is instantiated on every cell, i.e. we have 𝑂𝑎 SON instances – SON instances are considered as black-boxes
HandOver Hysteresis) (*) MLB = Mobility Load Balancing; (*) MRO = Mobility Robustness Optimization; (*) CIO = Cell Individual Offset
SONF Z (inst. n) SONF 1 (inst. n)
The network at time t:
The SON at time t:
– 𝑉𝑢,𝑜,𝑙,𝑨 ∈ −1;0 , 𝑉𝑢,𝑜,𝑙,𝑨 ∈ 0; 1 and 𝑉𝑢,𝑜,𝑙,𝑨 = 0 is a request to decrease, increase and maintain the value of the target parameter, respectively – 𝑣 signifies the criticalness of the update, i.e. how unhappy the SON instance is with the current parameter configuration – we consider that 𝑣 may also be 𝑤𝑝𝑗𝑒 for the case when a SON function is not tuning a certain parameter
The SONCO at time t:
– if 𝐵𝑢,𝑜,𝑙 = 1/ 𝐵𝑢,𝑜,𝑙 = −1 means that we increase/decrease the value of 𝑄
𝑢,𝑜,𝑙 only if there exists a SON update request to
do so, else we maintain the value of 𝑄
𝑢,𝑜,𝑙.
5
𝑢+1, 𝜊𝑢+1 , i.e. is a “random” function of 𝑄 𝑢+1, and some noise 𝜊𝑢+1
time
𝑜
𝑙,𝑨 𝑉𝑢+1,𝑜,𝑙,𝑨
6
∞ 𝑢=0
𝑏
∞ 𝑢=0
𝑜∈𝒪
𝑜 𝑞′ 𝑜∈𝒪
7
𝑜 𝑞 ≈ 𝑋
8
9
𝑙,𝑨 𝑉𝑢,𝑜,𝑙,𝑨
𝑜 (𝑜 ∈ 𝒪)
𝑜 𝑞 to 𝑋
10
simulations
evaluated over the last 24h, when the CIOs become reasonably stable
– analyzing tracking capability of the algorithm, – HetNet scenarios,
11