Automated Application and Resource Management in the Cloud Nikos - - PowerPoint PPT Presentation

automated application and resource management in the cloud
SMART_READER_LITE
LIVE PREVIEW

Automated Application and Resource Management in the Cloud Nikos - - PowerPoint PPT Presentation

Automated Application and Resource Management in the Cloud Nikos Parlavantzas HDR defense15th June 2020 The demand for collaboration and entertainment services is skyrocketing 12 million 2,900% New daily active Growth of daily users in


slide-1
SLIDE 1

Automated Application and Resource Management in the Cloud

Nikos Parlavantzas HDR defense—15th June 2020

slide-2
SLIDE 2

The demand for collaboration and entertainment services is skyrocketing

2,900% Growth of daily participants between December and April 2.5 million New connected users in 1 week, up 25% 12 million New daily active users in 1 week, up 37.5% 15.8 million New subscribers between January and March

2

slide-3
SLIDE 3

Made possible by the cloud

Estimated Cloud Growth in US

https://internetassociation.org/publications/ex amining-economic-contributions-cloud-united- states-economy/

http://ec.europa.eu/newsroom/document.cfm?doc_id=41184

Worldwide Enterprise Spending on Cloud and Data Centers

3

slide-4
SLIDE 4

Cloud brings benefits to both providers and customers

PROVIDER CUSTOMERS

APP

4

slide-5
SLIDE 5

Cloud brings benefits to both providers and customers

Pools and shares resources among customers Obtain and release resources on demand while paying only for actual use

PROVIDER CUSTOMERS

APP

5

slide-6
SLIDE 6

To optimise these benefits, these actors require automated management

PROVIDER CUSTOMERS

6

slide-7
SLIDE 7

Building automated management systems is difficult

PROVIDER CUSTOMERS

Satisfying provider objectives (e.g., increasing profit) despite variations in

  • customer workload
  • QoS and prices of underlying infrastructure and

cloud services

7

slide-8
SLIDE 8

Building automated management systems is difficult

PROVIDER CUSTOMERS

Satisfying customer objectives (e.g., maintaining performance, reducing costs) despite variations in:

  • application workload and requirements
  • cloud service capabilities, QoS, and prices

8

slide-9
SLIDE 9

Outline

PROVIDER CUSTOMER CUSTOMERS PROVIDER

1 2 3

Application Management for Customers Resource Management for Providers Application and Resource Management in Private Clouds

9

slide-10
SLIDE 10

Outline

PROVIDER CUSTOMER CUSTOMERS PROVIDER

1 2 3

Application Management for Customers Resource Management for Providers Application and Resource Management in Private Clouds

10

slide-11
SLIDE 11

We discuss two management systems for public cloud providers

Both provide SLA support and profit optimisation

►Resource and execution management for SaaS providers ►SLA-based resource management for PaaS providers

11

slide-12
SLIDE 12

How to manage service delivery in order to increase the SaaS provider profit?

SaaS system that

►delivers a (master-worker) application as a

service

►supports SLAs that specify response time

and reliability QoS

►uses a single IaaS cloud

12

SaaS

requests requests

IaaS

slide-13
SLIDE 13

Our approach provides mechanisms for creating SLAs and managing service delivery

Decides how to use the mechanisms based on cost-benefit calculations

[A. Lage PhD][CPE'17a][CLOUD’12] [ECOWS'11][MONA'10][CIT'10] With A. Lage, J.-L. Pazat

13

QoS translation Execution management Resource management SLAs Requests Worker replacement Under-provisioning Contract cancellation Request cancellation

slide-14
SLIDE 14

Applying the mechanisms increases provider profit

Experiments with an audio encoding service on Grid’5000

§ Contract cancellation à 4x profit increase § Under-provisioning à 20-50% profit increase § Worker replacement à 60% profit increase

PaaS SaaS IaaS

Customer 1 Customer n

...

Service

Qu4DS

SLA Management

Templates Qos Translation Templates Qos Translation Negotiation

Execution Management Resource Management

Booking Allocation

Under-provisioning Contract Rescission Configurable mechanisms booking, (re)allocation negotiation, provisioning, cancelation request management resource availability

14

slide-15
SLIDE 15

How to share resources under SLA constraints in

  • rder to increase the PaaS provider profit?

PaaS system that

►hosts applications of various

types

►supports SLAs ►uses private resources and

resources rented from IaaS clouds

15

PaaS

Public cloud

slide-16
SLIDE 16

Our approach decomposes resources into application type-specific groups

►Groups decide independently how

to allocate their resources to applications, exchanging, if necessary, resources with other groups and renting resources from public clouds

►Decisions are based on a profit

  • ptimisation policy

[D. Dib PhD] [CPE'17b][CCGrid'14] [ORMaCloud'13] With D. Dib, C. Morin

16

Public cloud

slide-17
SLIDE 17

Optimisation policy at a glance

►Three options for obtaining

missing resources:

§ waiting for private resources to become available § obtaining them from running applications § renting them from a public cloud

17

Public cloud new app

slide-18
SLIDE 18

Optimisation policy at a glance

►Three options for obtaining

missing resources:

§ waiting for private resources to become available § obtaining them from running applications § renting them from a public cloud

►Bids correspond to estimated

penalties for each option

18

Public cloud price bid new app bid

slide-19
SLIDE 19

The approach increases provider profit

► Experiments in Grid’5000 (90 nodes)

§ MapReduce and batch applications

► The optimisation policy generates 9.02%

more provider profit than a baseline policy and has minimal QoS impact

Private resources Public resources VM manager Resource Manager Cluster Manager Cluster Manager Client Manager

  • VC. a
  • VC. b

Private VM Public VM

Users

Add/Remove private VMs Add/Remove public VMs SLA negotiation Submission Results VMs exchange Request transmitting

  • App. 1
  • App. 2
  • App. 1

Controller

  • App. 2

Controller

  • App. 3
  • App. 4
  • App. 3

Controller

  • App. 4

Controller

19

slide-20
SLIDE 20

To sum up

►A complete, SLA-driven management solution for SaaS

providers

►An SLA-based PaaS solution hosting various application types

  • n a hybrid cloud

►No support for dynamically adding resources to running

applications

►Only homogeneous, coarse-grained resources

20

slide-21
SLIDE 21

Outline

PROVIDER CUSTOMER CUSTOMERS PROVIDER

1 2 3

Application Management for Customers Resource Management for Providers Application and Resource Management in Private Clouds

21

slide-22
SLIDE 22

We discuss two application management systems for IaaS customers

22

►Managing modular multi-cloud applications ►Managing epidemic simulation applications with monolithic

structure

slide-23
SLIDE 23

Deploying and managing applications in multi-cloud environments is challenging

►Producing an initial deployment that

satisfies requirements

►Dynamically adapting the deployment

to react to environment changes (e.g., changes in workload, resource prices)

23

slide-24
SLIDE 24

Platforms that address this challenge adopt a similar architecture

Application

24

Modelling Deciding Executing Monitoring

slide-25
SLIDE 25

Platforms that address this challenge adopt a similar architecture

Application

25

Modelling Deciding Executing Monitoring

[PaaSage] [C. Ruiz PhD] [PDP’18] [ARMS’17] [UCC’16] [IOT360'16] With L. Pham, A. Sinha, C. Morin, C. Ruiz, H. Duran-Limon

slide-26
SLIDE 26

Platforms that address this challenge adopt a similar architecture

Application

26

Modelling Deciding Execu3ng Monitoring

[PaaSage] [C. Ruiz PhD] [PDP’18] [ARMS’17] [UCC’16] [IOT360'16] With L. Pham, A. Sinha, C. Morin, C. Ruiz, H. Duran-Limon

slide-27
SLIDE 27

How to continuously optimise both the performance and cost of a multi-cloud application?

►Existing solutions fail to consider adaptation costs together

with adaptation benefits

27

slide-28
SLIDE 28

We assume that the application generates revenue for the customer depending on performance

CUSTOMER APP Depends on app performance

Objective: Optimise profit (i.e., revenue – cloud charges)

28

slide-29
SLIDE 29

We propose a continuous deployment optimisation process

Reasoning Comparison Validation

Proposed Deployment Model Reconfiguration Plan Reconfiguration Actions Workload, Performance, Prices Current Deployment Model

29

slide-30
SLIDE 30

Validation at a glance

time Profit Current profit now

30

slide-31
SLIDE 31

Validation at a glance

time Profit Reconfiguration duration Stability Interval Current profit Profit during reconfiguration Profit a@er reconfiguraAon now Total profit of doing reconfiguration

31

slide-32
SLIDE 32

Validation at a glance

time Profit Reconfiguration duration Current profit now Total profit of doing nothing Total profit of doing reconfiguration > Adapt Stability Interval

32

slide-33
SLIDE 33

The approach is effective in optimizing performance and cost

► Evaluated the approach using

experiments with a web application on multiple clouds

► Showed effectiveness of reconfiguration

validation

► Demonstrated PaaSage using

applications developed by partners

33

slide-34
SLIDE 34

How to enable legacy epidemic simulation applications to run in the cloud?

Challenges

►decomposing applications into services ►facilitating their deployment and operation on multiple

clouds

►supporting elasticity and handling failures

34

slide-35
SLIDE 35

DiFFuSE supports structuring simulators as distributed, interacting services

35

[CPE’20][CloudCom’17] With L. Pham, C. Morin,

  • S. Arnoux, G. Beaunée, L. Qi,
  • P. Gontier, P. Ezanno

►Provides reusable code for

commonly required functionality

►Provides data exchange mechanisms

supporting replication and failure handling

►Builds on PaaSage to support multi-

cloud deployment and elasticity

slide-36
SLIDE 36

DiFFuSE enables simulation applications to fully exploit cloud platforms

►Used to restructure two

legacy epidemic simulators developed by INRAE

§ Spread of bovine viral diarrhea virus (BVDV) § Spread of Mycobacterium avium subspecies paratuberculosis (MAP)

36

slide-37
SLIDE 37

Applying DiFFuSE required a small number of code changes

►BVDV application

§ 6.3% of original code was modified

►MAP application

§ 1.78% of original code was modified

37

slide-38
SLIDE 38

Applying DiFFuSE brought many benefits

► Experiments using the two applications

showed that DiFFuSE

§ can exploit variable numbers and types of cloud resources § handles failures automatically § allows elastically modifying resource allocations

38

slide-39
SLIDE 39

To sum up

►Tools for adapting multi-cloud applications

and migrating legacy simulation applications to the cloud

►Exploited by academic and industrial

partners

§ PaaSage open source platform § DiFFuSE open source framework

►Using DiFFuSE remains difficult for

scientists

39

slide-40
SLIDE 40

Outline

PROVIDER CUSTOMER CUSTOMERS PROVIDER

1 2 3

Application Management for Customers Resource Management for Providers Application and Resource Management in Private Clouds

40

slide-41
SLIDE 41

How to share a private infrastructure among customers?

Allocate my resources efficiently Need results as soon as possible

41

Need results before 9 am Need throughput >50K tps

slide-42
SLIDE 42

Existing solutions rely on priorities or quotas assigned by administrators

►Assume that administrators know the value of resources for all

customers at all times or that customers truthfully declare such values

42

slide-43
SLIDE 43

We applied a market-based approach together with application adaptation

►Provider distributes fine-grained virtual resources through an

auction

►Customer employs controllers that dynamically adapt the

resource demand

[S. Costache PhD] [JPDC'17][EuroPar'13][CloudCom’13] [HPCC'12][VHPC’11] With S. Costache, C. Morin, S. Kortas

43

slide-44
SLIDE 44

Resource Market

Controller Controller Controller

VM VM VM VM

Application Application Application Physical Infrastructure

VM VM VM VM VM VM VM VM VM VM

44

slide-45
SLIDE 45

Receiving bids

45

VM Scheduler

period Node Node

𝑐!"#$, 𝑐%&%$ 𝑐!"#', 𝑐%&%' 𝑐!"#(, 𝑐%&%(

slide-46
SLIDE 46

Calculating allocations

46

VM Scheduler

period

𝐵) = 𝑐) ∑ 𝑐* 𝐷 ideal allocations

Node Node

slide-47
SLIDE 47

Placing VMs

47

VM Scheduler

period Node Node

slide-48
SLIDE 48

Charging

48

VM Scheduler

period Node Node

slide-49
SLIDE 49

The approach provides policies for vertical and horizontal scaling

Controller

Policy Prices, Performance Modify bids or number of VMs, suspend application, … SLO, Budget

49

slide-50
SLIDE 50

The approach supports various types of applications and SLOs

MPI applications with deadline-based or best-effort SLOs Condor with SLO of increasing throughput Torque with SLO of reducing wait time

►Demonstrated adaptation to changes in resource availability

and SLO

50

slide-51
SLIDE 51

The approach enables customers to satisfy their SLOs

(a) Total satisfaction (b) Finished applications

  • 2500
  • 2000
  • 1500
  • 1000
  • 500

500 1000 1500 2000 2500

1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 Total Customer Satisfaction (103 credits) Inter-arrival factor FCFS EDF

20 40 60 80 100 120

1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 Percentage of sucesfully finished applications Inter-arrival factor FCFS EDF Merkat Merkat

Simulations with large traces (120 nodes)

51

slide-52
SLIDE 52

The approach enables customers to satisfy their SLOs

Pla/orm

% Met Deadlines Satisfaction

Merkat 82.5% 764330 Maui 58.1%

  • 178581

Testbed experiments (10 nodes)

52

slide-53
SLIDE 53

To sum up

►Generic and extensible application and resource

management approach

►Efficient resource allocations ►Developing policies remains a complex and ad hoc

process

►More suitable for completely decentralised systems

53

slide-54
SLIDE 54

Outline

PROVIDER CUSTOMER CUSTOMERS PROVIDER

1 2 3

Application Management for Customers Resource Management for Providers Application and Resource Management in Private Clouds

54

slide-55
SLIDE 55

Some findings

1 2 3

Application Management for Customers Resource Management for Providers Application and Resource Management in Private Clouds

55

► SLAs simplify decision making

► Application management requires

considering reconfiguration costs as well as benefits

► The market-based approach is an

effective way to coordinate interactions among applications and the platform

slide-56
SLIDE 56

Outlook

Management solutions for additional domains

►Function-as-a-service (FaaS)

§ Lack of QoS support

►Fog computing

§ Higher level of dynamism and heterogeneity § More economic actors [Y. Bouizem PhD]

56

slide-57
SLIDE 57

Outlook

Next generation management methods

►Decision-making

§ Machine learning techniques

►Decentralised management

§ Economic and pricing mechanisms

57

slide-58
SLIDE 58

Thank you

André Lage-Freitas, Stefania Costache, Djawida Dib, Carlos Ruiz Diaz, Linh Manh Pham, Arnab Sinha, Yasmina Bouizem, Baptiste Goupille-Lescar, Christine Morin, Hector Duran-Limon, Jean-Louis Pazat, Samuel Kortas, Eric Lenormand, Yvon Jégou, Sandie Arnoux, Gaël Beaunée, Luyuan Qi, Philippe Gontier, Pauline Ezanno, MYRIADS team, …

58

slide-59
SLIDE 59

References

► [CPE'20] N. Parlavantzas, L. M. Pham, C. Morin, S. Arnoux, G. Beaunée, L. Qi, P. Gontier, and P. Ezanno. “A

Service-Based Framework for Building and Executing Epidemic Simulation Applications in the Cloud”. Concurrency and Computation: Practice and Experience 32:5, 2020

► [PDP'18] N. Parlavantzas, L. M. Pham, A. Sinha, and C. Morin. “Cost-Effective Reconfiguration for Multi-Cloud

Applications”. 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). Cambridge, 2018

► [JPDC'17] S. Costache, S. Kortas, C. Morin, and N. Parlavantzas. “Market-Based Autonomous Resource and

Application Management in Private Clouds”. Journal of Parallel and Distributed Computing 100, 2017

► [CPE'17a] A. Lage Freitas, N. Parlavantzas, and J. Pazat. “Cloud Resource Management Driven by Profit

Augmentation”. Concurrency and Computation: Practice and Experience 29:4, 2017

► [CPE'17b] D. Dib, N. Parlavantzas, and C. Morin. “SLA-Based PaaS Profit Optimization”. Concurrency and

Computation: Practice and Experience 29:21, 2017

► [CloudCom'17] L. M. Pham, N. Parlavantzas, C. Morin, S. Arnoux, L. Qi, P. Gontier, and P. Ezanno. “DiFFuSE, a

Distributed Framework for Cloud-Based Epidemic Simulations: A Case Study in Modelling the Spread of Bovine Viral Diarrhea Virus”. 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). Hong Kong, 2017

► [ARMS'17] C. Ruiz, H. A. Duran-Limon, and N. Parlavantzas. “An RLS Memory-Based Mechanism for the

Automatic Adaptation of VMs on Cloud Environments”. 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing-ARMS-CC’17. Washington, DC, USA, 2017

59

slide-60
SLIDE 60

References

► [UCC'16] C. Ruiz, H. A. Duran-Limon, and N. Parlavantzas. “Towards a Software Product Line-Based

Approach to Adapt IaaS Cloud Configurations”. 9th International Conference on Utility and Cloud Computing - UCC ’16. Shanghai, China, 2016

► [IOT360'16] T. Kirkham, A. Sinha, N. Parlavantzas, B. Kryza, P

. Fremantle, K. Kritikos, and B. Aziz. “Privacy Aware On-Demand Resource Provisioning for IoT Data Processing”. In: Internet of Things. IoT Infrastructures. 2016

► [CCGrid'14] D. Dib, N. Parlavantzas, and C. Morin. “SLA-Based Profit Optimization in Cloud Bursting PaaS”.

2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 2014

► [EuroPar'13] S. Costache, N. Parlavantzas, C. Morin, and S. Kortas. “On the Use of a Proportional-Share

Market for Application SLO Support in Clouds”. In: Euro-Par 2013 Parallel Processing. Berlin, Heidelberg, 2013

► [CloudCom'13] S. Costache, N. Parlavantzas, C. Morin, and S. Kortas. “Merkat: A Market-Based SLO-Driven

Cloud Platform”. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science. IEEE, 2013

► [ORMaCloud'13] D. Dib, N. Parlavantzas, and C. Morin. “Meryn: Open, SLA-Driven, Cloud Bursting PaaS”.

First ACM Workshop on Optimization Techniques for Resources Management in Clouds. ORMaCloud ’13. ACM, New York, New York, USA

60

slide-61
SLIDE 61

► [HPCC'12] S. Costache, N. Parlavantzas, C. Morin, and S. Kortas. “Themis: Economy-Based Automatic

Resource Scaling for Cloud Systems”. In: 2012 IEEE 14th International Conference on High Performance Computing and Communication. 2012

► [CLOUD'12] A. L. Freitas, N. Parlavantzas, and J.-L. Pazat. “An Integrated Approach for Specifying and

Enforcing SLAs for Cloud Services”. In: Proceedings of the 5th International Conference on Cloud Computing (CLOUD). IEEE, 2012

► [ECOWS'11] A. Lage Freitas, N. Parlavantzas, and J.-L. Pazat. “Cost Reduction through SLA-Driven Self-

Management”. In: 2011 IEEE Ninth European Conference on Web Services. Ieee, 2011

► [VHPC'11] S. Costache, N. Parlavantzas, C. Morin, and S. Kortas. “An Economic Approach for Application

QoS Management in Clouds”. Euro-Par 2011: Parallel Processing Workshops. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012

► [MONA'10] A. Lage Freitas, N. Parlavantzas, and J.-L. Pazat. “A QoS Assurance Framework for Distributed

Infrastructures”. 3rd International Workshop on Monitoring, Adaptation and Beyond - MONA ’10. ACM Press, New York, New York, USA, 2010

► [CIT'10] A. Lage-Freitas, N. Parlavantzas, and J.-L. Pazat. “A Self-Adaptable Approach for Easing the

Development of Grid-Oriented Services”. IEEE International Conference on Computer and Information Technology (CIT). Bradford, UK, 2010

References

61