Automated Application and Resource Management in the Cloud Nikos - - PowerPoint PPT Presentation
Automated Application and Resource Management in the Cloud Nikos - - PowerPoint PPT Presentation
Automated Application and Resource Management in the Cloud Nikos Parlavantzas HDR defense15th June 2020 The demand for collaboration and entertainment services is skyrocketing 12 million 2,900% New daily active Growth of daily users in
The demand for collaboration and entertainment services is skyrocketing
2,900% Growth of daily participants between December and April 2.5 million New connected users in 1 week, up 25% 12 million New daily active users in 1 week, up 37.5% 15.8 million New subscribers between January and March
2
Made possible by the cloud
Estimated Cloud Growth in US
https://internetassociation.org/publications/ex amining-economic-contributions-cloud-united- states-economy/
http://ec.europa.eu/newsroom/document.cfm?doc_id=41184
Worldwide Enterprise Spending on Cloud and Data Centers
3
Cloud brings benefits to both providers and customers
PROVIDER CUSTOMERS
APP
4
Cloud brings benefits to both providers and customers
Pools and shares resources among customers Obtain and release resources on demand while paying only for actual use
PROVIDER CUSTOMERS
APP
5
To optimise these benefits, these actors require automated management
PROVIDER CUSTOMERS
6
Building automated management systems is difficult
PROVIDER CUSTOMERS
Satisfying provider objectives (e.g., increasing profit) despite variations in
- customer workload
- QoS and prices of underlying infrastructure and
cloud services
7
Building automated management systems is difficult
PROVIDER CUSTOMERS
Satisfying customer objectives (e.g., maintaining performance, reducing costs) despite variations in:
- application workload and requirements
- cloud service capabilities, QoS, and prices
8
Outline
PROVIDER CUSTOMER CUSTOMERS PROVIDER
1 2 3
Application Management for Customers Resource Management for Providers Application and Resource Management in Private Clouds
9
Outline
PROVIDER CUSTOMER CUSTOMERS PROVIDER
1 2 3
Application Management for Customers Resource Management for Providers Application and Resource Management in Private Clouds
10
We discuss two management systems for public cloud providers
Both provide SLA support and profit optimisation
►Resource and execution management for SaaS providers ►SLA-based resource management for PaaS providers
11
How to manage service delivery in order to increase the SaaS provider profit?
SaaS system that
►delivers a (master-worker) application as a
service
►supports SLAs that specify response time
and reliability QoS
►uses a single IaaS cloud
12
SaaS
requests requests
IaaS
Our approach provides mechanisms for creating SLAs and managing service delivery
Decides how to use the mechanisms based on cost-benefit calculations
[A. Lage PhD][CPE'17a][CLOUD’12] [ECOWS'11][MONA'10][CIT'10] With A. Lage, J.-L. Pazat
13
QoS translation Execution management Resource management SLAs Requests Worker replacement Under-provisioning Contract cancellation Request cancellation
Applying the mechanisms increases provider profit
Experiments with an audio encoding service on Grid’5000
§ Contract cancellation à 4x profit increase § Under-provisioning à 20-50% profit increase § Worker replacement à 60% profit increase
PaaS SaaS IaaS
Customer 1 Customer n
...
Service
Qu4DS
SLA Management
Templates Qos Translation Templates Qos Translation Negotiation
Execution Management Resource Management
Booking Allocation
Under-provisioning Contract Rescission Configurable mechanisms booking, (re)allocation negotiation, provisioning, cancelation request management resource availability
14
How to share resources under SLA constraints in
- rder to increase the PaaS provider profit?
PaaS system that
►hosts applications of various
types
►supports SLAs ►uses private resources and
resources rented from IaaS clouds
15
PaaS
Public cloud
Our approach decomposes resources into application type-specific groups
►Groups decide independently how
to allocate their resources to applications, exchanging, if necessary, resources with other groups and renting resources from public clouds
►Decisions are based on a profit
- ptimisation policy
[D. Dib PhD] [CPE'17b][CCGrid'14] [ORMaCloud'13] With D. Dib, C. Morin
16
Public cloud
Optimisation policy at a glance
►Three options for obtaining
missing resources:
§ waiting for private resources to become available § obtaining them from running applications § renting them from a public cloud
17
Public cloud new app
Optimisation policy at a glance
►Three options for obtaining
missing resources:
§ waiting for private resources to become available § obtaining them from running applications § renting them from a public cloud
►Bids correspond to estimated
penalties for each option
18
Public cloud price bid new app bid
The approach increases provider profit
► Experiments in Grid’5000 (90 nodes)
§ MapReduce and batch applications
► The optimisation policy generates 9.02%
more provider profit than a baseline policy and has minimal QoS impact
Private resources Public resources VM manager Resource Manager Cluster Manager Cluster Manager Client Manager
- VC. a
- VC. b
Private VM Public VM
Users
Add/Remove private VMs Add/Remove public VMs SLA negotiation Submission Results VMs exchange Request transmitting
- App. 1
- App. 2
- App. 1
Controller
- App. 2
Controller
- App. 3
- App. 4
- App. 3
Controller
- App. 4
Controller
19
To sum up
►A complete, SLA-driven management solution for SaaS
providers
►An SLA-based PaaS solution hosting various application types
- n a hybrid cloud
►No support for dynamically adding resources to running
applications
►Only homogeneous, coarse-grained resources
20
Outline
PROVIDER CUSTOMER CUSTOMERS PROVIDER
1 2 3
Application Management for Customers Resource Management for Providers Application and Resource Management in Private Clouds
21
We discuss two application management systems for IaaS customers
22
►Managing modular multi-cloud applications ►Managing epidemic simulation applications with monolithic
structure
Deploying and managing applications in multi-cloud environments is challenging
►Producing an initial deployment that
satisfies requirements
►Dynamically adapting the deployment
to react to environment changes (e.g., changes in workload, resource prices)
23
Platforms that address this challenge adopt a similar architecture
Application
24
Modelling Deciding Executing Monitoring
Platforms that address this challenge adopt a similar architecture
Application
25
Modelling Deciding Executing Monitoring
[PaaSage] [C. Ruiz PhD] [PDP’18] [ARMS’17] [UCC’16] [IOT360'16] With L. Pham, A. Sinha, C. Morin, C. Ruiz, H. Duran-Limon
Platforms that address this challenge adopt a similar architecture
Application
26
Modelling Deciding Execu3ng Monitoring
[PaaSage] [C. Ruiz PhD] [PDP’18] [ARMS’17] [UCC’16] [IOT360'16] With L. Pham, A. Sinha, C. Morin, C. Ruiz, H. Duran-Limon
How to continuously optimise both the performance and cost of a multi-cloud application?
►Existing solutions fail to consider adaptation costs together
with adaptation benefits
27
We assume that the application generates revenue for the customer depending on performance
CUSTOMER APP Depends on app performance
Objective: Optimise profit (i.e., revenue – cloud charges)
28
We propose a continuous deployment optimisation process
Reasoning Comparison Validation
Proposed Deployment Model Reconfiguration Plan Reconfiguration Actions Workload, Performance, Prices Current Deployment Model
29
Validation at a glance
time Profit Current profit now
30
Validation at a glance
time Profit Reconfiguration duration Stability Interval Current profit Profit during reconfiguration Profit a@er reconfiguraAon now Total profit of doing reconfiguration
31
Validation at a glance
time Profit Reconfiguration duration Current profit now Total profit of doing nothing Total profit of doing reconfiguration > Adapt Stability Interval
32
The approach is effective in optimizing performance and cost
► Evaluated the approach using
experiments with a web application on multiple clouds
► Showed effectiveness of reconfiguration
validation
► Demonstrated PaaSage using
applications developed by partners
33
How to enable legacy epidemic simulation applications to run in the cloud?
Challenges
►decomposing applications into services ►facilitating their deployment and operation on multiple
clouds
►supporting elasticity and handling failures
34
DiFFuSE supports structuring simulators as distributed, interacting services
35
[CPE’20][CloudCom’17] With L. Pham, C. Morin,
- S. Arnoux, G. Beaunée, L. Qi,
- P. Gontier, P. Ezanno
►Provides reusable code for
commonly required functionality
►Provides data exchange mechanisms
supporting replication and failure handling
►Builds on PaaSage to support multi-
cloud deployment and elasticity
DiFFuSE enables simulation applications to fully exploit cloud platforms
►Used to restructure two
legacy epidemic simulators developed by INRAE
§ Spread of bovine viral diarrhea virus (BVDV) § Spread of Mycobacterium avium subspecies paratuberculosis (MAP)
36
Applying DiFFuSE required a small number of code changes
►BVDV application
§ 6.3% of original code was modified
►MAP application
§ 1.78% of original code was modified
37
Applying DiFFuSE brought many benefits
► Experiments using the two applications
showed that DiFFuSE
§ can exploit variable numbers and types of cloud resources § handles failures automatically § allows elastically modifying resource allocations
38
To sum up
►Tools for adapting multi-cloud applications
and migrating legacy simulation applications to the cloud
►Exploited by academic and industrial
partners
§ PaaSage open source platform § DiFFuSE open source framework
►Using DiFFuSE remains difficult for
scientists
39
Outline
PROVIDER CUSTOMER CUSTOMERS PROVIDER
1 2 3
Application Management for Customers Resource Management for Providers Application and Resource Management in Private Clouds
40
How to share a private infrastructure among customers?
Allocate my resources efficiently Need results as soon as possible
41
Need results before 9 am Need throughput >50K tps
Existing solutions rely on priorities or quotas assigned by administrators
►Assume that administrators know the value of resources for all
customers at all times or that customers truthfully declare such values
42
We applied a market-based approach together with application adaptation
►Provider distributes fine-grained virtual resources through an
auction
►Customer employs controllers that dynamically adapt the
resource demand
[S. Costache PhD] [JPDC'17][EuroPar'13][CloudCom’13] [HPCC'12][VHPC’11] With S. Costache, C. Morin, S. Kortas
43
Resource Market
Controller Controller Controller
VM VM VM VM
Application Application Application Physical Infrastructure
VM VM VM VM VM VM VM VM VM VM
44
Receiving bids
45
VM Scheduler
period Node Node
𝑐!"#$, 𝑐%&%$ 𝑐!"#', 𝑐%&%' 𝑐!"#(, 𝑐%&%(
Calculating allocations
46
VM Scheduler
period
𝐵) = 𝑐) ∑ 𝑐* 𝐷 ideal allocations
Node Node
Placing VMs
47
VM Scheduler
period Node Node
Charging
48
VM Scheduler
period Node Node
The approach provides policies for vertical and horizontal scaling
Controller
Policy Prices, Performance Modify bids or number of VMs, suspend application, … SLO, Budget
49
The approach supports various types of applications and SLOs
MPI applications with deadline-based or best-effort SLOs Condor with SLO of increasing throughput Torque with SLO of reducing wait time
►Demonstrated adaptation to changes in resource availability
and SLO
50
The approach enables customers to satisfy their SLOs
(a) Total satisfaction (b) Finished applications
- 2500
- 2000
- 1500
- 1000
- 500
500 1000 1500 2000 2500
1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 Total Customer Satisfaction (103 credits) Inter-arrival factor FCFS EDF
20 40 60 80 100 120
1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 Percentage of sucesfully finished applications Inter-arrival factor FCFS EDF Merkat Merkat
Simulations with large traces (120 nodes)
51
The approach enables customers to satisfy their SLOs
Pla/orm
% Met Deadlines Satisfaction
Merkat 82.5% 764330 Maui 58.1%
- 178581
Testbed experiments (10 nodes)
52
To sum up
►Generic and extensible application and resource
management approach
►Efficient resource allocations ►Developing policies remains a complex and ad hoc
process
►More suitable for completely decentralised systems
53
Outline
PROVIDER CUSTOMER CUSTOMERS PROVIDER
1 2 3
Application Management for Customers Resource Management for Providers Application and Resource Management in Private Clouds
54
Some findings
1 2 3
Application Management for Customers Resource Management for Providers Application and Resource Management in Private Clouds
55
► SLAs simplify decision making
► Application management requires
considering reconfiguration costs as well as benefits
► The market-based approach is an
effective way to coordinate interactions among applications and the platform
Outlook
Management solutions for additional domains
►Function-as-a-service (FaaS)
§ Lack of QoS support
►Fog computing
§ Higher level of dynamism and heterogeneity § More economic actors [Y. Bouizem PhD]
56
Outlook
Next generation management methods
►Decision-making
§ Machine learning techniques
►Decentralised management
§ Economic and pricing mechanisms
57
Thank you
André Lage-Freitas, Stefania Costache, Djawida Dib, Carlos Ruiz Diaz, Linh Manh Pham, Arnab Sinha, Yasmina Bouizem, Baptiste Goupille-Lescar, Christine Morin, Hector Duran-Limon, Jean-Louis Pazat, Samuel Kortas, Eric Lenormand, Yvon Jégou, Sandie Arnoux, Gaël Beaunée, Luyuan Qi, Philippe Gontier, Pauline Ezanno, MYRIADS team, …
58
References
► [CPE'20] N. Parlavantzas, L. M. Pham, C. Morin, S. Arnoux, G. Beaunée, L. Qi, P. Gontier, and P. Ezanno. “A
Service-Based Framework for Building and Executing Epidemic Simulation Applications in the Cloud”. Concurrency and Computation: Practice and Experience 32:5, 2020
► [PDP'18] N. Parlavantzas, L. M. Pham, A. Sinha, and C. Morin. “Cost-Effective Reconfiguration for Multi-Cloud
Applications”. 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). Cambridge, 2018
► [JPDC'17] S. Costache, S. Kortas, C. Morin, and N. Parlavantzas. “Market-Based Autonomous Resource and
Application Management in Private Clouds”. Journal of Parallel and Distributed Computing 100, 2017
► [CPE'17a] A. Lage Freitas, N. Parlavantzas, and J. Pazat. “Cloud Resource Management Driven by Profit
Augmentation”. Concurrency and Computation: Practice and Experience 29:4, 2017
► [CPE'17b] D. Dib, N. Parlavantzas, and C. Morin. “SLA-Based PaaS Profit Optimization”. Concurrency and
Computation: Practice and Experience 29:21, 2017
► [CloudCom'17] L. M. Pham, N. Parlavantzas, C. Morin, S. Arnoux, L. Qi, P. Gontier, and P. Ezanno. “DiFFuSE, a
Distributed Framework for Cloud-Based Epidemic Simulations: A Case Study in Modelling the Spread of Bovine Viral Diarrhea Virus”. 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). Hong Kong, 2017
► [ARMS'17] C. Ruiz, H. A. Duran-Limon, and N. Parlavantzas. “An RLS Memory-Based Mechanism for the
Automatic Adaptation of VMs on Cloud Environments”. 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing-ARMS-CC’17. Washington, DC, USA, 2017
59
References
► [UCC'16] C. Ruiz, H. A. Duran-Limon, and N. Parlavantzas. “Towards a Software Product Line-Based
Approach to Adapt IaaS Cloud Configurations”. 9th International Conference on Utility and Cloud Computing - UCC ’16. Shanghai, China, 2016
► [IOT360'16] T. Kirkham, A. Sinha, N. Parlavantzas, B. Kryza, P
. Fremantle, K. Kritikos, and B. Aziz. “Privacy Aware On-Demand Resource Provisioning for IoT Data Processing”. In: Internet of Things. IoT Infrastructures. 2016
► [CCGrid'14] D. Dib, N. Parlavantzas, and C. Morin. “SLA-Based Profit Optimization in Cloud Bursting PaaS”.
2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 2014
► [EuroPar'13] S. Costache, N. Parlavantzas, C. Morin, and S. Kortas. “On the Use of a Proportional-Share
Market for Application SLO Support in Clouds”. In: Euro-Par 2013 Parallel Processing. Berlin, Heidelberg, 2013
► [CloudCom'13] S. Costache, N. Parlavantzas, C. Morin, and S. Kortas. “Merkat: A Market-Based SLO-Driven
Cloud Platform”. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science. IEEE, 2013
► [ORMaCloud'13] D. Dib, N. Parlavantzas, and C. Morin. “Meryn: Open, SLA-Driven, Cloud Bursting PaaS”.
First ACM Workshop on Optimization Techniques for Resources Management in Clouds. ORMaCloud ’13. ACM, New York, New York, USA
60
► [HPCC'12] S. Costache, N. Parlavantzas, C. Morin, and S. Kortas. “Themis: Economy-Based Automatic
Resource Scaling for Cloud Systems”. In: 2012 IEEE 14th International Conference on High Performance Computing and Communication. 2012
► [CLOUD'12] A. L. Freitas, N. Parlavantzas, and J.-L. Pazat. “An Integrated Approach for Specifying and
Enforcing SLAs for Cloud Services”. In: Proceedings of the 5th International Conference on Cloud Computing (CLOUD). IEEE, 2012
► [ECOWS'11] A. Lage Freitas, N. Parlavantzas, and J.-L. Pazat. “Cost Reduction through SLA-Driven Self-
Management”. In: 2011 IEEE Ninth European Conference on Web Services. Ieee, 2011
► [VHPC'11] S. Costache, N. Parlavantzas, C. Morin, and S. Kortas. “An Economic Approach for Application
QoS Management in Clouds”. Euro-Par 2011: Parallel Processing Workshops. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012
► [MONA'10] A. Lage Freitas, N. Parlavantzas, and J.-L. Pazat. “A QoS Assurance Framework for Distributed
Infrastructures”. 3rd International Workshop on Monitoring, Adaptation and Beyond - MONA ’10. ACM Press, New York, New York, USA, 2010
► [CIT'10] A. Lage-Freitas, N. Parlavantzas, and J.-L. Pazat. “A Self-Adaptable Approach for Easing the
Development of Grid-Oriented Services”. IEEE International Conference on Computer and Information Technology (CIT). Bradford, UK, 2010
References
61