A Cost-Sensitive Adaptation Engine for Server Consolidation of - - PowerPoint PPT Presentation

a cost sensitive adaptation engine for server
SMART_READER_LITE
LIVE PREVIEW

A Cost-Sensitive Adaptation Engine for Server Consolidation of - - PowerPoint PPT Presentation

A Cost-Sensitive Adaptation Engine for Server Consolidation of Multitier Applications Gueyoung Jung, Calton Pu Georgia Institute of Technology Kaustubh Joshi, Matti Hiltunen, Richard Schlichting AT&T Labs Research Context Cloud


slide-1
SLIDE 1

A Cost-Sensitive Adaptation Engine for Server Consolidation of Multitier Applications

Gueyoung Jung, Calton Pu

Georgia Institute of Technology

Kaustubh Joshi, Matti Hiltunen, Richard Schlichting

AT&T Labs Research

slide-2
SLIDE 2

Middleware 2009

  • Nov. 30, 2009

Context

  • Cloud infrastructures proving resource pool shared by

multiple applications

  • Multi-tier applications (e.g., web, application,

database tier servers)

  • Server consolidation through virtualization allowing

each physical machine to host tier servers in isolated containers (VMs)

  • Performance optimization in the context of end-to-

end response time through resource configuration

  • Resource configuration: CPU capacity tuning of VM,

VM migration, and increase/decrease replication level

slide-3
SLIDE 3

Middleware 2009

  • Nov. 30, 2009

Motivations

  • Dynamic resource configuration has become

crucial in consolidated server environments.

  • Adaptation actions such as VM migration and

replication make it more feasible.

  • Challenge: indiscriminate usage of adaptations

can have significant impacts on the overall QoS

  • f hosted applications.

50 100 150 200 250 15:00 15:16 15:32 15:48 16:04 16:20 16:36 16:52 17:08 17:24 17:40 17:56 18:12 18:28 18:44 19:00 19:16 19:32 19:48 20:04 20:20 20:36 20:52 21:08 21:24 21:40 21:56 22:12 22:28 Time Response Time cost-oblivious

slide-4
SLIDE 4

Middleware 2009

  • Nov. 30, 2009

Adaptation Benefit vs. Cost

Time Workload t0 t1

Workload stability interval

slide-5
SLIDE 5

Middleware 2009

  • Nov. 30, 2009

Time Response Time t0 t1

Adaptation Benefit vs. Cost

No Adaptation Desired No Adaptation Desired Cheap Adaptation Expensive Adaptation Cheap Adaptation Expensive Adaptation Optimal Adaptation Delay Delta Response Time

slide-6
SLIDE 6

Middleware 2009

  • Nov. 30, 2009

Adaptation Costs

  • Vary with workload, adaptation type, and

performance characteristics of applications and their tier servers

Change of response time Change of adaptation delay

100 200 300 400 500 600 700 800 100 200 300 400 500 600 700 800

Number of concurrent sessions Delta res. time (ms)

10000 20000 30000 40000 50000 60000 70000 80000 100 200 300 400 500 600 700 800

Number of concurrent sessions

  • Adapt. delay (ms)
slide-7
SLIDE 7

Middleware 2009

  • Nov. 30, 2009

Adaptation Costs

  • Affected by the workload of background

applications sharing resources with adapted application (virtualization overheads)

200 300 400 500 100 200 300 400 500 600 100 200 300 400 500

ΔResponse Time (ms) # Users (Adapted App) # Users (Background App)

200 300 400 500 10000 20000 30000 40000 50000 60000 70000 100 200 300 400 500

Adaptation Delay (ms) # Users (Adapted App) # Users (Background App)

Change of response time Change of adaptation delay

slide-8
SLIDE 8

Middleware 2009

  • Nov. 30, 2009

Contributions

  • Develop a framework to generate optimal

balance between various adaptations’ benefits and costs.

  • Demonstrate the framework in dynamic

resource configuration for multiple multi-tier applications in a server consolidation environment.

  • Specifically, generate a set of optimal

adaptation actions to maximize overall utility.

slide-9
SLIDE 9

Middleware 2009

  • Nov. 30, 2009

Architecture

Adaptation Engine

Workload Monitor Search Algorithm W RT U* LQN Model Cost Model Application Resource

  • Adapt. Action

Off-line experiments LQN Solver Cost Mapping Optimizer ARMA Filter Optimal Adapt. Action Utility Function

Estimator

W RT

Controller

E uc da U c a c

a s

Δ

p p

Build costs and benefits models offline Estimate costs and benefits by solving models online with given workloads Estimate workload stability Generate a set of optimal adaptation actions using estimates and utilities Unify cost and benefit

slide-10
SLIDE 10

Middleware 2009

  • Nov. 30, 2009

Modeling

  • Estimation of benefits: Given workload W and

configuration c, estimate response time RT using layered queueing network models.

VMM Web Server Disk App. Server Disk DB Server Disk Net Net Net Client CPU CPU CPU Disk Disk Disk VMM VMM

Network Ping Measurement Servlet.jar Instrumentation LD_PRELOAD Instrumentation Function call Resource use

slide-11
SLIDE 11

Middleware 2009

  • Nov. 30, 2009
  • Estimation of costs: Given workload W,

configuration c, and adaptation action a, experimentally, iteratively measure delay da and delta response time ΔRTa. Then generate a mapping table to be used online.

  • Estimation of workload stability CW: Given

workload history, estimate how long current workload will stay within a range using ARMA filter.

Modeling (cont.)

slide-12
SLIDE 12

Middleware 2009

  • Nov. 30, 2009

Utility Function

  • 80
  • 70
  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

10 20 30 40 50 60 70 80 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75

Penalty / Reward Request rate (per second)

Reward Penalty

Each application s has its own SLA that provides TRTs, base Rs and Ps. Rs and Ps are factored by the intensity of workload Ws.

  • Maximize U = (CW - ∑ dak) ∑ usci+1+∑ (dak ∑ usck-1,ak)

ak∈A s∈S ak∈A s∈S

Cost utility Benefit utility

Where CW is the estimated length of stability interval.

slide-13
SLIDE 13

Middleware 2009

  • Nov. 30, 2009

Optimization

cnew1 cnew2 cnew3 ……. cnewn cmax

Maximum configuration

U U

  • ld

new k j i k j i

  • ld

new

− −

∑ ∑

, , , , ρ

ρ

  • Search the desired

configuration using bin- packing and gradient-based search algorithms.

  • not consider adaptation costs.

cnew1 cnew2 cnew3 ……. cnewn … …

Desired configuration

  • Generate optimal actions

using a best first search algorithm.

  • A* graph search algorithm.

c0

Current configuration (no adaptation)

a1 a2 a3 an ……. c1 c2 c3 cn a1 a2 a3 an ……. c31 c32 c33 c3n … …

Optimal configuration

slide-14
SLIDE 14

Middleware 2009

  • Nov. 30, 2009

Test-bed Architecture

  • Develop a small virtualized data center
  • Deploy multiple 3-tier RUBiS applications

Domain-0 Web Server

  • App. Server

DB Server VM1 VM2 VM Domain-0 Web Server

  • App. Server

DB Server VM1 VM2 VM Domain-0 Web Server

  • App. Server

DB Server VM1 VM2 VM

Adaptation Engine Workload Monitor Virtual Machine Pool Estimator Controller Active Hosts

Domain-0 Hypervisor Web Server

  • App. Server

DB Server VM VM VM Dormant DB Server Dormant DB Server Dormant

  • App. Server

Domain-0 Hypervisor VM VM VM

Shared Storage

VM Image

slide-15
SLIDE 15

Middleware 2009

  • Nov. 30, 2009

Workloads

10 20 30 40 50 60 70 80 15:00 15:23 15:46 16:09 16:32 16:55 17:18 17:41 18:04 18:27 18:50 19:13 19:36 19:59 20:22 20:45 21:08 21:31 21:54 22:17 RUBiS-1 RUBiS-2 Time Request rate (per sec) 10 20 30 40 50 60 70 80 15:00 15:05 15:10 15:15 15:20 15:25 15:30 15:35 15:40 15:45 15:50 15:55 16:00 16:05 16:10 16:15 16:20 16:25 RUBiS-1 RUBiS-2 Time Request rate (per sec)

Time of day trace Flash crowd trace

slide-16
SLIDE 16

Middleware 2009

  • Nov. 30, 2009

Model Accuracy

20 40 60 80 100 120 140 15:02 15:16 15:30 15:44 15:58 16:12 16:26 16:40 16:54 17:08 17:22 17:36 17:50 18:04 18:18 18:32 18:46 19:00 19:14 19:28 Response time (msec) Time Exp. Model 2 4 6 8 10 12 14 1 4 7 1013161922252831343740434649525558616467 monitored estimated

Interval (min) Prediction window

Estimation error of benefits and costs is around 15% Estimation error of workload stability is around 15%

slide-17
SLIDE 17

Middleware 2009

  • Nov. 30, 2009

Comparison Evaluation

  • Compare 5 strategies

NA: No adaptation. A Static configuration used CO: Cost-Oblivious. No consideration of adaptation costs Oracle (Desired): Simulation with adaptation costs = 0 1-hour: Adaptation every 1 hour CS: Cost-Sensitive.

slide-18
SLIDE 18

Middleware 2009

  • Nov. 30, 2009

End-to-End Response Times

50 100 150 200 250 15:00 15:24 15:48 16:12 16:36 17:00 17:24 17:48 18:12 18:36 19:00 19:24 19:48 20:12 20:36 21:00 21:24 21:48 22:12 NA CS CO

Response time (ms) Time

50 100 150 200 250 15:00 15:06 15:12 15:18 15:24 15:30 15:36 15:42 15:48 15:54 16:00 16:06 16:12 16:18 16:24 NA CS CO

Response time (ms) Time

Time of day trace Flash crowd trace

slide-19
SLIDE 19

Middleware 2009

  • Nov. 30, 2009

Cumulative Utilities

Time of day trace Flash crowd trace

  • 2000

2000 4000 6000 8000 10000 12000 14000 16000 18000 15:00 15:30 16:00 16:30 17:00 17:30 18:00 18:30 19:00 19:30 20:00 20:30 21:00 21:30 22:00 NA CS CO Oracle 1-hour

Utility Time

  • 1000
  • 500

500 1000 1500 2000 2500 3000 3500 4000 15:00 15:06 15:12 15:18 15:24 15:30 15:36 15:42 15:48 15:54 16:00 16:06 16:12 16:18 16:24 NA CS CO Oracle 1-hour

Time Utility

slide-20
SLIDE 20

Middleware 2009

  • Nov. 30, 2009

Adaptation Actions

The number of adaptation actions triggered in flash crowd scenario

Type of Action CS CO CPU Increase and Decrease 14 36 Add MySQL replica 1 4 Remove MySQL replica 1 4 Migrate Apache 4 10 Migrate Tomcat 4 10 Migrate MySQL 2

slide-21
SLIDE 21

Middleware 2009

  • Nov. 30, 2009

Conclusion

  • Adaptation actions such as VM replication

and migration can impose significant performance costs.

  • Our approach makes smart decision on when

and how to act to enhance the satisfaction of response time SLAs.

slide-22
SLIDE 22

Middleware 2009

  • Nov. 30, 2009

On-going Work

  • Integrate power management into the

problem formulation by considering power consumption as another cost.

  • Handle large-scale setup by extending the

framework to multi-level hierarchical control, where each level represents different time scale and scope of control.

slide-23
SLIDE 23

Middleware 2009

  • Nov. 30, 2009

Questions?

slide-24
SLIDE 24

Middleware 2009

  • Nov. 30, 2009

Multi-Level Contol

Time Monitoring Interval (mi) On-demand controller actions Stability Interval ck: Old Config mi Periodic controller actions mi mi ck+1 ck+2 ck+3

Periodic controller generates relatively cheap actions for short intervals against any workload changes. On-demand controller generates relatively expensive actions for longer interval against a certain degree of workload changes.

slide-25
SLIDE 25

Middleware 2009

  • Nov. 30, 2009

Utility Function (detail)

  • 80
  • 70
  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

10 20 30 40 50 60 70 80 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75

Penalty / Reward Request rate (per second)

Reward Penalty

Each application s has its own SLA that provides TRTs, base Rs and Ps. Rs and Ps are factored by the intensity of workload Ws.

  • us = 1[RTs ≤ TRTs] Rs(Ws)/M – 1[RTs >TRTs] Ps(Ws)/M

where M is the length of unit interval.

  • Maximize U = (CW - ∑ dak) ∑ usci+1+∑ (dak ∑ usck-1,ak)

ak∈A s∈S ak∈A s∈S

  • Cost usa can be computed by replacing RTs with

in the above equation. RTsa = RTs +ΔRTsa

slide-26
SLIDE 26

Middleware 2009

  • Nov. 30, 2009

Optimization Algorithm

c0 Candidate node Intermediate node

  • 1. Start from current configuration c0.
  • 2. Compute the upper bound utility u*.
  • 3. Explore adaptation actions

including “do nothing” if the node is candidate node.

do nothing c1 c2 c3 a1 a2 a3 ……. cn an c’0

  • 4. Compute utility of each node and

store it to V. for each intermediate node, using u*. e.g., U1=(CW-da1) u* + da1 ua1

  • 5. Sort nodes in V by decreasing utility

and select the first node.

  • 6. If the node is a candidate node,

then stop search and return actions used to reach the node. Otherwise, keep searching from the node.

c3,1 c3,2 c3,n, ……. a1 a2 an a1 a2 an ……. C3,2,1 c3,2,2 c3,2,n a1 a2 an ……. c3,2,1,1 c3,2,1,2 c3,2,1,n a1 a2 an ……. c1,1 c1,2 c1,n a1 a2 an ……. c1,2,1 c1,2,2 c1,2,n do nothing c’1,2