Extracting Flexible, Replayable Models from Large Block Traces T 2 M Vasily Tarasov 1 , Santhosh Kumar 1 , Jack Ma 2 , Dean Hildebrand 3 , Anna Povzner 2 , Geoff Kuenning 2 , Erez Zadok 1 1 Stony Brook University 2 Harvey Mudd College 3 IBM Research – Almaden
Outline 1. Traces and their problems 2. Workload models suitability 3. Design of the model extractor 4. Evaluation 5. Conclusions Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 2
Traces Event Trace record Time- Opera- I/O size Offset stamp tion ● In general case, any event can be 0 read 4096 0 traced (process forking, file 0.5 read 4096 4096 0.7 read 4096 8192 accesses, user logins) 1.3 write 8192 28762 ● Timestamp is a common field 1.5 write 8192 32768 ● Other fields depend on the read 4096 12288 1.6 specific events traced read 4096 14384 2.0 ● We used block traces ● Our approach is valid for any trace Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 3
Trace Use Cases Workload analysis and characterization Tune existing systems Design new systems Highly valuable source Trace replay Evaluate, compare, and validate system behavior There are problems Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 4
Problems with Trace Replay Large in size Disturb results Replayer bottlenecks on I/O Cache pollution Hard to distribute Static objects Hard to intelligently and systematically modify the workload Not easy to compare Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 5
Outline 1. Traces and their problems 2. Workload models suitability 3. Design of the model extractor 4. Evaluation 5. Conclusions Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 6
Statistics Matter ● Monday’s trace is not Monday Tuesday exactly the same as Trace Trace a Tuesday’s trace ● Responses are the same ● Statistics of the workload in the traces impact the system: Same ♦ read/write ratio Same ♦ I/O size - Latency Observe - Throughput ● Set of statistics depends system’s - Power on specific system response: - Disk utilization Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 7
Outline 1. Traces and their problems 2. Workload models suitability 3. Design of the model extractor 4. Evaluation 5. Conclusions Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 8
Design Goals Accuracy System responses match Conciseness Small model size Flexibility Trade model size for accuracy Existing benchmarks for workload generation Extensibility Statistics and benchmarks Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 9
Trace Chunking Workload changes in the trace over time 8KB 6KB I/O 2KB 2KB 2KB size 1KB 0.5KB 0.5KB Trace time Chunk the trace: Fixed chunking first Then deduplicate chunks This often results in variable chunking Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 10
Within a Chunk Assume stationary workload Feature functions Trace p = (p 1 , p 2 , …, p n ) Trace field vector: p 1 p 2 p n Feature function: f 1 = f 1 (p, s 1 ) s 1 : state Put into a Feature function vector: multi-dimensional histogram f = (f 1 (p, s 1 ), f 2 (p, s 2 ), …, f n (p, s n )) Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 11
Multi-Dimensional Histogram p: p 1 – operation: read – 0, write – 1 f 1 p 2 – I/O size: in KB operation p 3 – offset: in KB write – 1 read – 0 f: f 2 100 791 38 12 f 1 = p 1 (operation) 1 I/O size (KB) 60 95 412 32 f 2 = p 2 (I/O size) 2 f 3 = log(offset – s 3 .prev_offset) 99 27 10 198 4 (inter-arrival distance) 0 0 0 0 8 f 3 Inter-arrival distance, logarithmic (KB) Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 12
Benchmark Plugins Yet another workload generator? Use existing benchmarks instead Benchmark plugin: Workload Benchmark description plugins in Benchmark’s Chunk histograms language ♦ command line arguments for IOzone ♦ config files for Filebench or FIO Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 13
Overall Design Fixed Histogram Benchmark Dedup- T Chunking Collection lication Plugin Workload R description in A Benchmark’s C language E Benchmark Initial time interval Similarity Features metrics and and histogram threshold granularity Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 14
Outline 1. Traces and their problems 2. Workload models suitability 3. Design of the model extractor 4. Evaluation 5. Conclusions Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 15
Evaluation 1. Replayed the trace 2. Emulated workload 3. Compared response (accuracy) parameters CPU Utilization Reads/sec Memory Writes/sec consumption Latency Interrupts I/O Utilization Context Switches I/O Queue length Wait Processes Request size Power Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 16
Evaluation setup Physical Setup single node with physical disk drives Virtual Setup VM with disk image on remote GPFS server Finance1 OLTP applications MS-WBS Microsoft build server Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 17
Finance1 on Physical System 300 Average relative 250 error <10% across all parameters and systems 200 Reads/Sec - Replay Throughput Reads/Sec - Emulation (ops/second) 150 Writes/Sec - Replay Write/Sec - Emulation 17−25 × size 100 reduction 50 0 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Time (seconds) Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 18
Outline 1. Traces and their problems 2. Workload models suitability 3. Design of the model extractor 4. Evaluation 5. Conclusions Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 19
Conclusions Extractor of workload models from traces Multi-dimensional histograms of feature functions Trace chunking Trade off accuracy for size reduction Standard benchmarks Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 20
Future work More of everything accuracy parameters, systems, traces File system traces Automatic selection of parameters chunking interval, matrix granularity Operations on models Extracting Workload Models from Block Traces – FAST 2012 2/11/2012 21
Extracting Flexible, Replayable Models from Large Block Traces http://goo.gl/yFdrG Q & A Vasily Tarasov 1 , Santhosh Kumar 1 , Jack Ma 2 , Dean Hildenbrand 3 , Anna Povzner 2 , Geoff Kuenning 2 , Erez Zadok 1 1 Stony Brook University 2 Harvey Mudd College 3 IBM Research – Almaden
Recommend
More recommend