Capacity planning with phased workloads Arif Merchant Storage - - PowerPoint PPT Presentation

capacity planning with phased workloads
SMART_READER_LITE
LIVE PREVIEW

Capacity planning with phased workloads Arif Merchant Storage - - PowerPoint PPT Presentation

WOSP 98, Santa Fe, NM, 12-16 October 1998 Capacity planning with phased workloads Arif Merchant Storage Systems Program Computer Systems Laboratory Hewlett-Packard Laboratories, Palo Alto, CA Joint work with E. Borowsky, R. Golding, P.


slide-1
SLIDE 1

WOSP ‘98, Santa Fe, NM, 12-16 October 1998

Capacity planning with phased workloads

Arif Merchant

Storage Systems Program Computer Systems Laboratory Hewlett-Packard Laboratories, Palo Alto, CA Joint work with E. Borowsky, R. Golding, P. Jacobson, L. Schreier, M. Spasojevic and John Wilkes

10/16/98

slide-2
SLIDE 2

1 slides-only.fm

Attribute-managed storage A day in the life of a System Administrator

Need more capacity. Need better performance. Must add devices. UGH!... my head hurts! Need high availability. Must rebalance the load. AAAGH!... Brain exploding! guarantees. Quality of service Network attached storage. More demanding applications. Headache today? Migraine tomorrow!!!

slide-3
SLIDE 3

3 slides-only.fm

Attribute-managed storage Motivation

Capacity Complexity: growing number of disks, Growing complexity of storage systems time 42% SPUs 37% Storage 21% Software HP 300GB TPC-D benchmark list price Growing cost of ownership for storage systems

Management costs

Initial disk purchase cost $

slide-4
SLIDE 4

5 slides-only.fm

Attribute-managed storage Opportunity

Servers High speed back-end storage network Services Clients Storage-utility interface

Optional: direct client access to storage network

Virtual stores

Storage utility

slide-5
SLIDE 5

7 slides-only.fm

Attribute-managed storage A closer look

To storage managers To clients Main storage interface Shared, network- attached storage devices Virtual stores Storage-management interface

Storage utility

Distributed storage- management functions

slide-6
SLIDE 6

9 slides-only.fm

Attribute-managed storage The goal

Say what you want not how to do it!

  • business-critical

availability

  • 100 IOs/sec
  • 200ms response time

RAID 3 data layout, across 5 of the disks on disk array F, using 64KB stripe size, 3MB dedicated buffer cache with 128KB sequential readahead buffer, delayed write-back with 1MB NVRAM buffer and max 10s residency time, dual 256Kb/s links via host interfaces 12.4.3 and 16.0.4, 1Gb/s trunk links between FibreChannel switches A-3 and B-1, …

slide-7
SLIDE 7

11 slides-only.fm

Attribute-managed storage The mechanism

applications workload requirements storage device abilities

workload

storage- system configuration assignment engine (solver)

slide-8
SLIDE 8

13 slides-only.fm

Attribute-managed storage The assignment problem

slide-9
SLIDE 9

15 slides-only.fm

Constraints Does it fit?

❏ Capacity constraints ❏ Is there enough space? ❏ Availability constraints ❏ Is it up often enough? ❏ Performance constraints ❏ Is response time adequate? E.g.: Are 95% of requests satisfied within 0.2 sec?

slide-10
SLIDE 10

17 slides-only.fm

Short Term Utilization Intuition

Queues form in stable system because of variation in workload arrival rate. Queueing delays can be controlled by controlling variability in work arrival rate.

Request arrivals Work in system

slide-11
SLIDE 11

19 slides-only.fm

Short Term Utilization A theorem

If the work arriving in every period of length T is such that the device can do it in T seconds, then the response time is always less than T seconds. ❏ Setting T= maximum response time allowed meets requirements. ❏ But ... this requirement is too strict.

Request arrivals Work in system

slide-12
SLIDE 12

21 slides-only.fm

Short Term Utilization An approximation

Pr{Work arriving in T < what device can do in T} > p => Pr{Response time < T} > p ❏ Translates bound on response time tail into a bound

  • n tail of Work(T)

❏ Approximation is exact for p=1 ❏ Distribution of Work arriving in time T frequently easy to calculate or approximate for simple workloads.

slide-13
SLIDE 13

23 slides-only.fm

Workload Characterization TPC-D workload traces: application phases

100 200 300

time (sec)

50 100 150 200

IOs/sec

100 200 300

time (sec)

50 100 150 200

IOs/sec

100 200 300 400

time (sec)

50 100 150 200

IOs/sec

50 100

time (sec)

50 100 150 200

IOs/sec

slide-14
SLIDE 14

25 slides-only.fm

Workload characterization Phased correlated model

Each workload is modeled as a ON-OFF Poisson process ❏ Parameters: ON time average, OFF time average, IO rate during ON period ❏ Correlation between workloads: pij = Pr{Aj is ON when Ai comes ON}

slide-15
SLIDE 15

27 slides-only.fm

Phasing and Short term utilization Combining forces

❏ Response times increase only when some workload goes ON ❏ Sufficient to test response time bounds only at the times workloads change state from OFF to ON ❏ Workload distribution is easy to estimate given a workload just went ON.

slide-16
SLIDE 16

29 slides-only.fm

Validation and testing Tasting the stew

Compared simulation and modelling results ❏ Baseline case: 8 streams, correlated sets of 4,2, 2. All predictions were correct. ❏ Checking tightness of predictions - are the bounds

  • ptimistic (wrong) or pessimistic?

0.2 0.5 1.0 1.5 2

Inter Arrival Time

1 2 3

Tightness

cosa0 cosa1 cosa2 cosa3 cosa4 cosa5 cosa6 cosa7 inaccuracies

slide-17
SLIDE 17

31 slides-only.fm

Validation and testing The validation loop

application measurement measurements

compare

application Panopticon Forum solver workload specs assignments predictions Forum validation loop KItrace KItrace

slide-18
SLIDE 18

33 slides-only.fm

Validation and testing The pudding

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Query

1000 2000 3000 4000

execution time (sec)

25 disks 15 disks

Query execution times: 25 vs. 15 disks

slide-19
SLIDE 19

35 slides-only.fm

Capacity planning What next?

❏ Better device models ❏ Better workload models ❏ Fault-tolerant on-line management

slide-20
SLIDE 20

37 slides-only.fm

Attribute-managed storage The future

Need guaranteed quality

  • f service?

NO PROBLEM! Storage distributed across the network? Continually changing workload?

http://www.hpl.hp.com/SSP