Learning Queuing Networks by Recurrent Neural Networks Giulio Garbi - - PowerPoint PPT Presentation

learning queuing networks by recurrent neural networks
SMART_READER_LITE
LIVE PREVIEW

Learning Queuing Networks by Recurrent Neural Networks Giulio Garbi - - PowerPoint PPT Presentation

Learning Queuing Networks by Recurrent Neural Networks Giulio Garbi , Emilio Incerto and Mirco Tribastone IMT School for Advanced Studies Lucca Lucca, Italy giulio.garbi@imtlucca.it ICPE 2020 Virtual Conference April 2024, 2020 Motivation


slide-1
SLIDE 1

Learning Queuing Networks by Recurrent Neural Networks

Giulio Garbi, Emilio Incerto and Mirco Tribastone IMT School for Advanced Studies Lucca Lucca, Italy giulio.garbi@imtlucca.it ICPE 2020 Virtual Conference April 20—24, 2020

slide-2
SLIDE 2

Motivation

  • Performance means revenue
  • «We are not the fastest retail site on the internet today» [Walmart, 2012]
  • «[…] page speed will be a ranking factor for mobile searches.» [Google]

è It’s worth investing in system performance. How?

Garbi, Incerto, Tribastone 2

slide-3
SLIDE 3

Motivation

  • Question: where to invest?
  • Performance estimation:
  • Profiling: easy, does not predict
  • Modeling: needs expert and

continuous update, predictions

Garbi, Incerto, Tribastone 3

slide-4
SLIDE 4

Motivation: our vision

  • If we had a model, we could try all possible choices, forecast and

choose the best option. è Automate model generation!!!

Garbi, Incerto, Tribastone 4

slide-5
SLIDE 5

Our Main Contribution

  • Direct association between:
  • Model: Fluid Approximation of Closed Queuing Networks
  • Automation: Recurrent Neural Networks
  • Automatic generation of models from data

Garbi, Incerto, Tribastone 5

slide-6
SLIDE 6

Model: Queuing Networks

  • Model that represent contention
  • f resources by clients
  • Clients ask for work to station

(resources)

  • Stations have a maximum

concurrency level, and a speed

  • Clients once served ask another

resource according to routing matrix

<µ1, s1> <µ2, s2> <µ3, s3> P1,2 P1,3 P2,1 P3,1 x1 x3 x2

Garbi, Incerto, Tribastone 6

slide-7
SLIDE 7

Model of a system

  • Resource è hardware
  • Routing matrix è program code
  • Clients è program instances

<µ1, s1> <µ2, s2> <µ3, s3> P1,2 P1,3 P2,1 P3,1 x1 x3 x2

Garbi, Incerto, Tribastone 7

slide-8
SLIDE 8

How our procedure works

Garbi, Incerto, Tribastone 8

Profiling Learning Model Prediction Changes

slide-9
SLIDE 9

1 2 H-1

  • t1
  • t2
t2P2,1 1 2 Cellh min min ∑ ∑ t1P1,2
  • t
M min ∑

Recurrent Neural Networks

  • Recurrent neural networks (RNN) work with sequences (e.g. time

series)

  • We will encode the model as a RNN with a custom structure.

Garbi, Incerto, Tribastone 9

  • t1
  • t2
t2P2,1 1 2 Cellh min min ∑ ∑ t1P1,2
  • t
M min ∑
  • t1
  • t2
t2P2,1 1 2 Cellh min min ∑ ∑ t1P1,2
  • t
M min ∑
slide-10
SLIDE 10

Recurrent Neural Networks

  • The system parameters are directly encoded in the RNN cell

èLearned model explains the system! (Explainable Neural Network)

  • We can modify the system afterwards to do prediction!

Garbi, Incerto, Tribastone 10

1 2 H-1

  • t1
  • t2
t2P2,1 1 2 Cellh min min ∑ ∑ t1P1,2
  • t
M min ∑
  • t1
  • t2
t2P2,1 1 2 Cellh min min ∑ ∑ t1P1,2
  • t
M min ∑
  • t1
  • t2
t2P2,1 1 2 Cellh min min ∑ ∑ t1P1,2
  • t
M min ∑
slide-11
SLIDE 11

Synthetic case studies: setting

  • 10 random systems: five with M=5 stations, five with M=10 stations
  • Concurrency levels between 15 and 30
  • Service rate between 4 and 30 clients/time unit
  • 100 traces, each one being an average of 500 executions, with

[0, 40 M] clients

  • Learning time: 74 min for M = 5 and 86 min for M = 10
  • Error function: % clients wrongly placed

Garbi, Incerto, Tribastone 11

slide-12
SLIDE 12

Synthetic case studies: prediction with different #clients

100 200 300 400 500 600 700 800

N

2 4 6 8 10

Prediction error (err)

M=5 M=10

No significant difference among network size and number of clients. è Good predictive power among different conditions

Garbi, Incerto, Tribastone 12

#clients

slide-13
SLIDE 13

Synthetic case studies: prediction with different concurrency levels

50 100 150 200 250

N

1 2 3 4 5

Prediction error (err)

M=5 M=10

Increased concurrency as to resolve the bottleneck è Learning outcome resilient to changes in part of the network

Garbi, Incerto, Tribastone 13

#clients

slide-14
SLIDE 14

Real case study: setting

  • node.js web application, replicated

3 times

  • Python script simulates N clients
  • Learning time: 27 min for N=26

Garbi, Incerto, Tribastone 14

LB

C1 C2 W

EAL EM ACHIECE

C

N MODEL 10

  • 5
  • 6
  • 1
  • N
  • NKNON AAMEE

1,1 M2 M3 2,210 3,35 4,6 1,2 1,3 2,1 3,1 ,1 1, M1 M4

M1 M2 M3 M4

slide-15
SLIDE 15

Real case study: prediction with different #clients

1 2 3 4 5 6 t(s) 5 10 15 20 25 30 35 40 45 Queue Length

M1 RNN-learned QN M1 Real System M2 RNN-learned QN M2 Real System M3 RNN-learned QN M3 Real System M4 RNN-learned QN M4 Real System

M3 is the bottleneck, and this affects the UX. We need to solve it…

Garbi, Incerto, Tribastone 15

1 2 3 4 5 6 t(s) 10 20 30 40 50 60 70 Queue Length

M1 RNN-learned QN M1 Real System M2 RNN-learned QN M2 Real System M3 RNN-learned QN M3 Real System M4 RNN-learned QN M4 Real System

1 2 3 4 5 6 t(s) 10 20 30 40 50 60 70 80 Queue Length

M1 RNN-learned QN M1 Real System M2 RNN-learned QN M2 Real System M3 RNN-learned QN M3 Real System M4 RNN-learned QN M4 Real System

1 2 3 4 5 6 t(s) 10 20 30 40 50 60 70 80 90 Queue Length

M1 RNN-learned QN M1 Real System M2 RNN-learned QN M2 Real System M3 RNN-learned QN M3 Real System M4 RNN-learned QN M4 Real System

N = 52 err = 6.46% N = 104 err = 6.45% N = 78 err = 5.03% N = 130 err = 9.05%

45 40 35 30 25 20 15 10 5 t(s) 0 1 2 3 4 5 6 Queue Length 70 60 50 40 30 20 10 t(s) 0 1 2 3 4 5 6 Queue Length 80 70 60 50 40 30 20 10 t(s) 0 1 2 3 4 5 6 Queue Length 90 80 70 60 50 40 30 20 10 t(s) 0 1 2 3 4 5 6 Queue Length

slide-16
SLIDE 16

Real case study: prediction with different structure

…by increasing the concurrency level of M3 err: 5.98%

1 2 3 4 5 6 t(s) 10 20 30 40 50 60 70 80 90 Queue Length

M1 RNN-learned QN M1 Real System M2 RNN-learned QN M2 Real System M3 RNN-learned QN M3 Real System M4 RNN-learned QN M4 Real System

…by changing the LB scheduling policy err: 6.10%

1 2 3 4 5 6 t(s) 10 20 30 40 50 60 70 80 90 Queue Length

M1 RNN-learned QN M1 Real System M2 RNN-learned QN M2 Real System M3 RNN-learned QN M3 Real System M4 RNN-learned QN M4 Real System

Garbi, Incerto, Tribastone 16

Bottleneck solved. Nice results also on a real HW+SW system.

slide-17
SLIDE 17

Limits

  • Many traces required to learn the system.
  • System must be observed at high frequency.
  • Layered systems currently not supported.
  • Resilient to limited changes, not extensive ones.

Garbi, Incerto, Tribastone 17

slide-18
SLIDE 18

Related work

  • Performance models from code (e.g. PerfPlotter, not predictive)
  • Modelling black-box systems (e.g. Siegmund et al., tree-structured

models)

  • Program-driven generation of models (e.g. Hrischuk et al., distributed

components that communicate via RPC)

  • Estimation of service demands in QN through several techniques (we

estimate service demands and routing matrix)

Garbi, Incerto, Tribastone 18

slide-19
SLIDE 19

Conclusions

  • We provided a method to estimate QN parameters using a RNN that

converges on feasible parameters.

  • With the estimated parameters, it is possible to estimate the

evolution of the system using a population different from the one used during learning or when doing structural modifications.

  • We want to apply the technique to more complex systems (e.g

LQN,multiclass), use other learning methodologies (e.g. neural ODEs) and improve the accuracy of the results

Garbi, Incerto, Tribastone 19

slide-20
SLIDE 20

Thank you!