Extracting Flexible, Replayable Models from Large Block Traces
Vasily Tarasov1, Santhosh Kumar1, Jack Ma2, Dean Hildebrand3, Anna Povzner2, Geoff Kuenning2, Erez Zadok1
1Stony Brook University 2Harvey Mudd College 3IBM Research – Almaden
T 2 M Vasily Tarasov 1 , Santhosh Kumar 1 , Jack Ma 2 , Dean - - PowerPoint PPT Presentation
Extracting Flexible, Replayable Models from Large Block Traces T 2 M Vasily Tarasov 1 , Santhosh Kumar 1 , Jack Ma 2 , Dean Hildebrand 3 , Anna Povzner 2 , Geoff Kuenning 2 , Erez Zadok 1 1 Stony Brook University 2 Harvey Mudd College 3 IBM
Extracting Flexible, Replayable Models from Large Block Traces
Vasily Tarasov1, Santhosh Kumar1, Jack Ma2, Dean Hildebrand3, Anna Povzner2, Geoff Kuenning2, Erez Zadok1
1Stony Brook University 2Harvey Mudd College 3IBM Research – Almaden
Outline
Traces
Time- stamp
Trace record
Event 0.5 0.7 1.3 1.5
I/O size
2.0 1.6
Offset
Opera- tion
read read read write write read read 4096 4096 4096 8192 8192 4096 4096 4096 8192 28762 32768 12288 14384
traced (process forking, file accesses, user logins)
specific events traced
Trace Use Cases
Workload analysis and characterization
Tune existing systems Design new systems
Trace replay
Evaluate, compare, and validate
system behavior
Highly valuable source There are problems
Problems with Trace Replay
Large in size
Disturb results
Hard to distribute
Static objects
Hard to intelligently and systematically
modify the workload
Not easy to compare
Outline
Statistics Matter
Tuesday Trace Monday Trace
exactly the same as a Tuesday’s trace
in the traces impact the system: ♦read/write ratio ♦I/O size
Observe system’s response: Same Same
Outline
Design Goals
Accuracy System responses match Conciseness Small model size Flexibility Trade model size for accuracy Existing benchmarks for workload generation Extensibility Statistics and benchmarks
Trace Chunking
I/O size Trace time
1KB 2KB 6KB 0.5KB 2KB 2KB 0.5KB 8KB
Chunk the trace:
Fixed chunking first Then deduplicate chunks This often results in variable chunking
Workload changes in the trace over time
Within a Chunk
Assume stationary workload Feature functions
p = (p1, p2, …, pn) f = (f1(p, s1), f2(p, s2), …, fn(p, sn)) f1 = f1(p, s1) p1 p2 pn Trace Trace field vector: Feature function: Feature function vector: s1: state Put into a multi-dimensional histogram
Multi-Dimensional Histogram
p1 – operation: read – 0, write – 1 p2 – I/O size: in KB p3 – offset: in KB f1 = p1 (operation) f2 = p2 (I/O size) f3 = log(offset – s3.prev_offset)
(inter-arrival distance)
100 10 60 38 27 32 12 198 412 791 99 95
f3
Inter-arrival distance, logarithmic (KB)
f1
f2
I/O size (KB)
read – 0 write – 1 1 2 4 8
p: f:
Benchmark Plugins
Yet another workload generator? Use existing benchmarks instead Benchmark plugin:
Benchmark plugins Workload description in Benchmark’s language ♦ command line arguments for IOzone ♦ config files for Filebench or FIO Chunk histograms
Overall Design
T R A C E
Fixed Chunking Histogram Collection lication Benchmark Plugin
Workload description in Benchmark’s language
Benchmark
Initial time interval Features and histogram granularity Similarity metrics and threshold
Dedup-
Outline
Evaluation
Reads/sec Writes/sec Latency I/O Utilization I/O Queue length Request size CPU Utilization Memory
consumption
Interrupts Context Switches Wait Processes Power
Evaluation setup
Physical Setup
single node with physical disk drives
Virtual Setup
VM with disk image on remote GPFS server
Finance1
OLTP applications
MS-WBS
Microsoft build server
Finance1 on Physical System
50 100 150 200 250 300 2000 4000 6000 8000 10000 12000 14000 16000 18000Reads/Sec - Replay Reads/Sec - Emulation Writes/Sec - Replay Write/Sec - Emulation
Time (seconds) Throughput (ops/second)
Average relative error <10% across all parameters and systems 17−25× size reduction
Outline
Conclusions
Extractor of workload models from traces Multi-dimensional histograms of feature functions Trace chunking Trade off accuracy for size reduction Standard benchmarks
Future work
More of everything
accuracy parameters, systems, traces
File system traces Automatic selection of parameters
chunking interval, matrix granularity
Operations on models
Extracting Flexible, Replayable Models from Large Block Traces
Vasily Tarasov1, Santhosh Kumar1, Jack Ma2, Dean Hildenbrand3, Anna Povzner2, Geoff Kuenning2, Erez Zadok1
1Stony Brook University 2Harvey Mudd College 3IBM Research – Almaden
http://goo.gl/yFdrG