T 2 M Vasily Tarasov 1 , Santhosh Kumar 1 , Jack Ma 2 , Dean - - PowerPoint PPT Presentation

t 2 m
SMART_READER_LITE
LIVE PREVIEW

T 2 M Vasily Tarasov 1 , Santhosh Kumar 1 , Jack Ma 2 , Dean - - PowerPoint PPT Presentation

Extracting Flexible, Replayable Models from Large Block Traces T 2 M Vasily Tarasov 1 , Santhosh Kumar 1 , Jack Ma 2 , Dean Hildebrand 3 , Anna Povzner 2 , Geoff Kuenning 2 , Erez Zadok 1 1 Stony Brook University 2 Harvey Mudd College 3 IBM


slide-1
SLIDE 1

Extracting Flexible, Replayable Models from Large Block Traces

Vasily Tarasov1, Santhosh Kumar1, Jack Ma2, Dean Hildebrand3, Anna Povzner2, Geoff Kuenning2, Erez Zadok1

1Stony Brook University 2Harvey Mudd College 3IBM Research – Almaden

T

2 M

slide-2
SLIDE 2 Extracting Workload Models from Block Traces – FAST 2012 2 2/11/2012

Outline

  • 1. Traces and their problems
  • 2. Workload models suitability
  • 3. Design of the model extractor
  • 4. Evaluation
  • 5. Conclusions
slide-3
SLIDE 3 Extracting Workload Models from Block Traces – FAST 2012 3 2/11/2012

Traces

Time- stamp

Trace record

Event 0.5 0.7 1.3 1.5

I/O size

2.0 1.6

Offset

Opera- tion

read read read write write read read 4096 4096 4096 8192 8192 4096 4096 4096 8192 28762 32768 12288 14384

  • In general case, any event can be

traced (process forking, file accesses, user logins)

  • Timestamp is a common field
  • Other fields depend on the

specific events traced

  • We used block traces
  • Our approach is valid for any trace
slide-4
SLIDE 4 Extracting Workload Models from Block Traces – FAST 2012 4 2/11/2012

Trace Use Cases

Workload analysis and characterization

 Tune existing systems  Design new systems

Trace replay

 Evaluate, compare, and validate

system behavior

Highly valuable source There are problems

slide-5
SLIDE 5 Extracting Workload Models from Block Traces – FAST 2012 5 2/11/2012

Problems with Trace Replay

 Large in size

 Disturb results

  • Replayer bottlenecks on I/O
  • Cache pollution

 Hard to distribute

 Static objects

 Hard to intelligently and systematically

modify the workload

 Not easy to compare

slide-6
SLIDE 6 Extracting Workload Models from Block Traces – FAST 2012 6 2/11/2012

Outline

  • 1. Traces and their problems
  • 2. Workload models suitability
  • 3. Design of the model extractor
  • 4. Evaluation
  • 5. Conclusions
slide-7
SLIDE 7 Extracting Workload Models from Block Traces – FAST 2012 7 2/11/2012

Statistics Matter

Tuesday Trace Monday Trace

  • Monday’s trace is not

exactly the same as a Tuesday’s trace

  • Responses are the same
  • Statistics of the workload

in the traces impact the system: ♦read/write ratio ♦I/O size

  • Set of statistics depends
  • n specific system
  • Latency
  • Throughput
  • Power
  • Disk utilization

Observe system’s response: Same   Same

slide-8
SLIDE 8 Extracting Workload Models from Block Traces – FAST 2012 8 2/11/2012

Outline

  • 1. Traces and their problems
  • 2. Workload models suitability
  • 3. Design of the model extractor
  • 4. Evaluation
  • 5. Conclusions
slide-9
SLIDE 9 Extracting Workload Models from Block Traces – FAST 2012 9 2/11/2012

Design Goals

 Accuracy  System responses match  Conciseness  Small model size  Flexibility  Trade model size for accuracy  Existing benchmarks for workload generation  Extensibility  Statistics and benchmarks

slide-10
SLIDE 10 Extracting Workload Models from Block Traces – FAST 2012 10 2/11/2012

Trace Chunking

I/O size Trace time

1KB 2KB 6KB 0.5KB 2KB 2KB 0.5KB 8KB

 Chunk the trace:

 Fixed chunking first  Then deduplicate chunks  This often results in variable chunking

 Workload changes in the trace over time

slide-11
SLIDE 11 Extracting Workload Models from Block Traces – FAST 2012 11 2/11/2012

Within a Chunk

 Assume stationary workload  Feature functions

p = (p1, p2, …, pn) f = (f1(p, s1), f2(p, s2), …, fn(p, sn)) f1 = f1(p, s1) p1 p2 pn Trace Trace field vector: Feature function: Feature function vector: s1: state Put into a multi-dimensional histogram

slide-12
SLIDE 12 Extracting Workload Models from Block Traces – FAST 2012 12 2/11/2012

Multi-Dimensional Histogram

p1 – operation: read – 0, write – 1 p2 – I/O size: in KB p3 – offset: in KB f1 = p1 (operation) f2 = p2 (I/O size) f3 = log(offset – s3.prev_offset)

(inter-arrival distance)

100 10 60 38 27 32 12 198 412 791 99 95

f3

Inter-arrival distance, logarithmic (KB)

f1

  • peration

f2

I/O size (KB)

read – 0 write – 1 1 2 4 8

p: f:

slide-13
SLIDE 13 Extracting Workload Models from Block Traces – FAST 2012 13 2/11/2012

Benchmark Plugins

 Yet another workload generator?  Use existing benchmarks instead  Benchmark plugin:

Benchmark plugins Workload description in Benchmark’s language ♦ command line arguments for IOzone ♦ config files for Filebench or FIO Chunk histograms

slide-14
SLIDE 14 Extracting Workload Models from Block Traces – FAST 2012 14 2/11/2012

Overall Design

T R A C E

Fixed Chunking Histogram Collection lication Benchmark Plugin

Workload description in Benchmark’s language

Benchmark

Initial time interval Features and histogram granularity Similarity metrics and threshold

Dedup-

slide-15
SLIDE 15 Extracting Workload Models from Block Traces – FAST 2012 15 2/11/2012

Outline

  • 1. Traces and their problems
  • 2. Workload models suitability
  • 3. Design of the model extractor
  • 4. Evaluation
  • 5. Conclusions
slide-16
SLIDE 16 Extracting Workload Models from Block Traces – FAST 2012 16 2/11/2012

Evaluation

 Reads/sec  Writes/sec  Latency  I/O Utilization  I/O Queue length  Request size  CPU Utilization  Memory

consumption

 Interrupts  Context Switches  Wait Processes  Power

  • 1. Replayed the trace
  • 2. Emulated workload
  • 3. Compared response (accuracy) parameters
slide-17
SLIDE 17 Extracting Workload Models from Block Traces – FAST 2012 17 2/11/2012

Evaluation setup

 Physical Setup

 single node with physical disk drives

 Virtual Setup

 VM with disk image on remote GPFS server

 Finance1

 OLTP applications

 MS-WBS

 Microsoft build server

slide-18
SLIDE 18 Extracting Workload Models from Block Traces – FAST 2012 18 2/11/2012

Finance1 on Physical System

50 100 150 200 250 300 2000 4000 6000 8000 10000 12000 14000 16000 18000

Reads/Sec - Replay Reads/Sec - Emulation Writes/Sec - Replay Write/Sec - Emulation

Time (seconds) Throughput (ops/second)

Average relative error <10% across all parameters and systems 17−25× size reduction

slide-19
SLIDE 19 Extracting Workload Models from Block Traces – FAST 2012 19 2/11/2012

Outline

  • 1. Traces and their problems
  • 2. Workload models suitability
  • 3. Design of the model extractor
  • 4. Evaluation
  • 5. Conclusions
slide-20
SLIDE 20 Extracting Workload Models from Block Traces – FAST 2012 20 2/11/2012

Conclusions

 Extractor of workload models from traces  Multi-dimensional histograms of feature functions  Trace chunking  Trade off accuracy for size reduction  Standard benchmarks

slide-21
SLIDE 21 Extracting Workload Models from Block Traces – FAST 2012 21 2/11/2012

Future work

 More of everything

 accuracy parameters, systems, traces

 File system traces  Automatic selection of parameters

 chunking interval, matrix granularity

 Operations on models

slide-22
SLIDE 22

Extracting Flexible, Replayable Models from Large Block Traces

Vasily Tarasov1, Santhosh Kumar1, Jack Ma2, Dean Hildenbrand3, Anna Povzner2, Geoff Kuenning2, Erez Zadok1

1Stony Brook University 2Harvey Mudd College 3IBM Research – Almaden

Q&A

http://goo.gl/yFdrG