JustRunIt: ExperimentBased Management of Virtualized Data Centers - - PowerPoint PPT Presentation

justrunit experiment based management of virtualized data
SMART_READER_LITE
LIVE PREVIEW

JustRunIt: ExperimentBased Management of Virtualized Data Centers - - PowerPoint PPT Presentation

JustRunIt: ExperimentBased Management of Virtualized Data Centers Wei Zheng Yoshio Turner Ricardo Bianchini Renato Santos John Janakiraman Rutgers University HP Labs MoMvaMon Managing data center is a challenging task Resource


slide-1
SLIDE 1

JustRunIt: Experiment‐Based Management of Virtualized Data Centers

Wei Zheng Ricardo Bianchini Rutgers University Yoshio Turner Renato Santos John Janakiraman HP Labs

slide-2
SLIDE 2

MoMvaMon

  • Managing data center is a challenging task

– Resource allocaMon, evaluaMon of soOware/hardware upgrades, capacity planning, etc. – Decisions affect performance, availability, energy consumpMon

  • State‐of‐the‐art uses modeling for these tasks

– Models give insight into system behavior – Fast exploraMon of large parameter spaces

  • Modeling has some important drawbacks

– Consumes a very expensive resource: human labor – Needs to be re‐calibrated and re‐validated as the systems evolve

slide-3
SLIDE 3

Our Approach

  • Idea: experiments are a beXer approach

– Consume a cheaper resource: machine Mme (and energy) – High fidelity

  • JustRunIt: an infrastructure for experiment‐based

management of virtualized data centers

  • Management system or administrator can use JustRunIt

results to perform management tasks

– Resource management and hardware/soOware upgrades – Select the best value for soOware tunables – Evaluate the correctness of administrator acMons

slide-4
SLIDE 4

Outline

  • MoMvaMon
  • JustRunIt design and implementaMon
  • EvaluaMon

– Case study 1: resource management – Case study 2: hardware upgrades

  • Related work
  • Conclusion
slide-5
SLIDE 5

Target Environment

  • Virtualized data centers host mulMple independent Internet

services

  • Each service comprises mulMple Mers, e.g. a web Mer, an

applicaMon Mer, and a database Mer

  • Each service has strict negoMated SLAs (Service Level

Agreements), e.g. response Mme

  • All services are hosted in VMs for isolaMon, easy migraMon,

management flexibility

slide-6
SLIDE 6

Data Center with JustRunIt

  • Creates sandbox
  • Clones VMs
  • Applies configuraMon changes
  • Duplicates live workload to sandbox
  • ProperMes

– No effect on on‐line services – Does not replicate enMre service – Almost service‐independent

Assess performance and energy of different configurations

A1 A2 A1 A2 A2 A3 A2 A3 A1 A2 A3 W1 W2 W1 W2 W2 W3 W1 W2 W3 D1 D2 D1 D3 D2 D3 D1 D2 D3

S-A2

Sandbox

S-W2 S-D2

On-line system

slide-7
SLIDE 7

JustRunIt Architecture

X X X X X X I I X I I I I X X I X T T T

Driver Checker

  • Parameter

Ranges

  • Heuristics
  • Time Limit

Interpolator Management Entity

Param values Experiment results Param values Experiment results

Experimenter

JustRunIt

param2 param1

slide-8
SLIDE 8

Experimenter

  • Step 1: Clone subset of

producMon system to a sandbox

– VM cloning: Modify Xen live migraMon to resume original VM instead of destroying it – Storage cloning: LVM copy‐on‐ write snapshot for sandbox VM – L2/L3 network address translaMon: implemented in driver domain netback driver to prevent network address conflict

  • Step 2: Apply configuraMon

changes

– Exs: CPU allocaMon, frequency

VM VM VM

slide-9
SLIDE 9
  • Proxies filter requests/replies from the sandbox VM
  • Emulates the Mming and funcMonal behavior of preceding and

following service Mers

– ApplicaMon protocol level requests/replies (e.g. HTTP)

Experimenter

Tier-N VM In-Proxy Out-Proxy Sandbox VM

  • Step 3: Duplicates live workload to sandbox using proxies
slide-10
SLIDE 10

JustRunIt Architecture

X X X X X X I I X I I I I X X I X T T T

Driver Checker

  • Parameter

Ranges

  • Heuristics
  • Time Limit

Interpolator Management Entity

Param values Experiment results Param values Experiment results

Experimenter

JustRunIt

param2 param1

slide-11
SLIDE 11

Driver

  • Goal: Fill in results matrix

within a Mme limit

  • Corners
  • Midpoints (recursive)
  • Heuristics

– Cancel experiments if gain for a resource addition falls below a threshold – Cancel experiments for tiers that do not produce the largest gains from a resource addition

X X X X X X X X X CPU allocation CPU Freq (min,min)

slide-12
SLIDE 12

JustRunIt Architecture

X X X X X X I I X I I I I X X I X T T T

Driver Checker

  • Parameter

Ranges

  • Heuristics
  • Time Limit

Interpolator Management Entity

Param values Experiment results Param values Experiment results

Experimenter

JustRunIt

param2 param1

slide-13
SLIDE 13

Interpolator and Checker

  • For simplicity, we use linear interpolaMon
  • Checker will verify the interpolated result by invoking

the experimenter to run corresponding experiments in the background

slide-14
SLIDE 14

Cost of JustRunIt

  • Building JustRunIt needs human effort also

– The most Mme‐consuming part is proxies implementaMon – Current proxies understand HTTP, mod_jk, MySQL protocols – Developed from an open source proxy daemon, each proxy need 800~1500 new lines of C code

  • Cost of VM Cloning: 42 lines of Python code in xend and 244

lines of C in netback driver

  • The engineering cost of JustRunIt can be amorMzed for any

service based on the same protocols

slide-15
SLIDE 15

Outline

  • MoMvaMon
  • JustRunIt design and implementaMon
  • EvaluaMon

– Case study 1: resource management – Case study 2: hardware upgrades

  • Related work
  • Conclusion
slide-16
SLIDE 16

Methodology

  • 15 HP Proliant C‐class blades (8G, 2 Xeon dual‐core)

interconnected with Gbit network

  • 2 types of 3 Mer Internet service

– RUBiS: online aucMon service modeled aOer Ebay.com – TPC‐W: online book store modeled aOer Amazon.com

  • Xen 3.3 with Linux 2.6.18
  • Dom0 pinned to separate core for performance isolaMon
slide-17
SLIDE 17

3-tier service with one node per tier; two nodes for proxies Overhead exposed – slight RT degradation, no effect on TP

Overhead on On‐line Service?

slide-18
SLIDE 18

Fidelity of The Sandbox ExecuMon?

T h r

  • u

g h p u t ( r e q s / s )

Application server at 400 requests/second (similar results for higher load)

Response Time Throughput

slide-19
SLIDE 19

Automated Management

Management EnMty Data Center JustRunIt

Change Result

slide-20
SLIDE 20

Case Study 1: Resource Management

  • Goal: consolidate the hosted services onto the smallest

possible set of nodes, while saMsfying all SLAs

  • Management enMty invokes JustRunIt when response Mme

SLA is violated, or when SLA is met by a large margin

  • Management enMty uses performance‐resource matrix to

determine resource needs

  • Management enMty performs bin packing (via simulated

annealing) to minimize number of physical machines and number of VM migraMons

slide-21
SLIDE 21

Case Study 1: Resource Management

  • 9 blades: 2 for first Mer; 2 for second Mer; 2 for third Mer; 3 for

load balancing and storage service

  • 4 services are populated
  • Each VM allocated 50% CPU
  • SLA: 50ms
  • Service 0 workload is increased to 1500reqs/sec aOer 2 mins
slide-22
SLIDE 22

Resource Management with JustRunIt

4 services on 11 nodes SLA = 50ms Increase load on S0 Run 3 exps for 3 mins JRI Modeling

Violating SLA Running experiments Migrating Violating SLA Solving model Migrating

slide-23
SLIDE 23

Case Study 2: Hardware Upgrades

  • Goal: evaluate if hardware upgrade allow further

consolidaMon and lower overall power consumpMon

  • JustRunIt uses one instance of new hardware in

sandbox to determine the consolidaMon savings

  • Bin packing determines necessary number of new

machines to accommodate producMon workload

slide-24
SLIDE 24

Case Study 2: Hardware Upgrades

  • IniMal server uses 90% of one CPU core on old

hardware (emulate using low frequency mode)

  • New machine (emulate using high frequency mode)

requires 72%

  • This would allow further consolidaMon in a large

system

slide-25
SLIDE 25
  • Modeling, feedback control, and machine learning for managing

data centers [Stewart’05, Stewart’08, Padala’07, Padala’09, Cohen’04]

  • Scaling down data centers emulaMon [Gupta’06, Gupta’08]
  • Sandboxing and duplicaMon for managing data centers

[Nagaraja’04, Tan’05, Oliveira’06]

  • Run experiments quickly [Osogami’06, Osogami’07]
  • SelecMng experiments to run [Zheng’07, Shivam’08]

Related Work

slide-26
SLIDE 26

Conclusions

  • JustRunIt infrastructure combines well with automated

management systems

  • Answers “what‐if” quesMons realisMcally and transparently
  • Can support a variety of management tasks
  • Future invesMgaMon

– Tier interacMons – Different workload mix – Build proxies for a database server

slide-27
SLIDE 27

THANK
YOU!
 QUESTIONS?