Pianola: A script-based I/O benchmark John May PSDW08, 17 November - PowerPoint PPT Presentation

Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark John May PSDW08, 17 November 2008 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department of Energy by LLNL-PRES-406688 Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344

I/O benchmarking: What’s going on here? � Is my computer’s I/O system “fast”? � Is the I/O system keeping up with my application? � Is the app using the I/O system effectively? � What tools do I need to answer these questions? � And what exactly do I mean by “I/O system” anyway? • For this talk, an I/O system is everything involved in storing data from the filesystem to the storage hardware Lawrence Livermore National Laboratory 2

Existing tools can measure general or application-specific performance � IOzone automatically W rite new file measures I/O system performance for different operations and parameters 600000 � Relatively little ability to 500000 customize I/O requests 400000 � KB/ sec 300000 Many application-oriented benchmarks 200000 • SWarp, MADbench2… 100000 16384 � 2048 Interface-specific benchmarks 0 256 64 256 32 1024 • IOR, //TRACE 4096 16384 65536 500000-600000 4 262144 1048576 4194304 400000-500000 Record size ( KB) 300000-400000 File size ( KB) 200000-300000 100000-200000 0-100000 Lawrence Livermore National Laboratory 3

Measuring the performance that matters � System benchmarks only measure general response, not application-specific response � Third-party application-based benchmarks may not generate the stimulus you care about � In-house applications may not be practical benchmarks • Difficult for nonexperts to build and run • Nonpublic source cannot be distributed to vendors and collaborators � Need benchmarks that… • Can be generated and used easily • Model application-specific characteristics Lawrence Livermore National Laboratory 4

Script-based benchmarks emulate real apps Script-based benchmark Application open compu te r ead Capture library compu te … Replay script Replay tool � Capture trace data from application and generate the same sequence of operations in a replay-benchmark � We began with //TRACE from CMU (Ganger’s group) • Records I/O events and intervening “compute” times • Focused on parallel I/O, but much of the infrastructure is useful for our sequential I/O work Lawrence Livermore National Laboratory 5

Challenges for script-based benchmarking: Recording I/O calls at the the right level � Instrumenting at high level fprintf( … ); + Easy with LD_PRELOAD /* work */ fprintf( … ); - Typically generates more /* more work */ events, so logs are bigger fprintf( … ); … - Need to replicate formatting - Timing includes computation open � compu te Instrumenting at low level write( … ); wr i t e … + Fewer types of calls to capture + Instrumentation is at I/O system interface - Cannot use LD_PRELOAD to intercept all calls Lawrence Livermore National Laboratory 6

First attempt at capturing system calls: Linux strace utility $ s t rac e - r -T -s 0 - e t race = f i l e ,desc l s 0 .000000 execve ( " / b i n / l s " , [ , . . . ] , [ / * 29 v a rs * / ] ) = 0 <0 . 000237> 0 .000297 open( " / e t c / l d . s o .cache" , O_ R D O N L Y) = 3 < 0 .000047 > 0 .000257 f s ta t64 (3 , { s t _ m o d e = S _ I F R E G|0644 , s t_ s i ze=64677 , . . . } ) = 0 < 0 .0 0 0 0 3 3 > 0 .000394 c l ose( 3 ) = 0 <0 .000015> 0 .000230 open( " / l i b / l i b r t . so .1 " , O_R D O NLY) = 3 <0 .000046> 0 .000289 read ( 3 , " " . . . , 512 ) = 512 <0 .000028> . . . � Records any selected set of system calls � Easy to use: just add to command line � Produces easily parsed output Lawrence Livermore National Laboratory 7

Strace results look reasonably accurate, but overall runtime is exaggerated 45 40 35 30 25 20 15 10 5 0 0 50 100 150 200 250 300 350 400 450 Execution Time Application Read Application Write Replay Read Replay Write Read (sec.) Write (sec.) Elapsed (sec.) Uninstrumented -- -- 324 Instrumented 41.8 11.8 402 Application Replay 37.0 11.8 390 Lawrence Livermore National Laboratory 8

For accurate recording, gather I/O calls using binary instrumentation Uninstrumented Instrumented mov 4 (%esp), %ebx mov 4 (%esp), %ebx save current state mov $0x4, %eax jmp to trampoline call my write function int $0x80 nop restore state ret ret jmp to original code � Can intercept and instrument specific system-level calls � Overhead of instrumentation is paid at program startup � Existing Jockey library works well for x86_32, but not ported to other platforms � Replay can be portable, though Lawrence Livermore National Laboratory 9

Issues for accurate replay � Replay engine must be able to Accurate replay read and parse events quickly Minimize I/O impact of reading script � Reading script must not interfere significantly with I/O activities being replicated � Script must be portable across open Replayed I/O events platforms compu te wr i t e … Correctly reproduce inter-event delays Lawrence Livermore National Laboratory 10

Accurate replay: Preparsing, compression, and buffering preparsing compression buffering open 010010 compu te 110101 … … wr i t e 010110 … … � Text-formatted output script is portable across platforms � Instrumentation output is parsed into binary format and compressed (~30:1) • Conversion done on target platform � Replay engine reads and buffers script data during “compute” phases between I/O events Lawrence Livermore National Laboratory 11

Replay timing and profile match original application well 40 35 30 25 20 15 10 5 0 0 50 100 150 200 250 300 350 400 Execution Time Application Read Application Write Replay Read Replay Write Read (sec.) Write (sec.) Elapsed(sec.) Uninstrumented -- -- 314 Instrumented 35.8 12.8 334 Application Replay 35.7 12.5 319 Lawrence Livermore National Laboratory 12

Things that didn’t help � Compressing text script as it’s generated • Only 2:1 compression • Time of I/O events themselves are not what’s very important during instrumentation phase � Replicating the memory footprint • Memory used by application is taken from same pool as I/O buffer cache • Smaller application (like the replay engine) should go faster because more buffer space available • Replicated memory footprint by tracking brk() and mmap() calls, but it made no difference! Lawrence Livermore National Laboratory 13

Conclusions on script-based I/O benchmarking � Gathering accurate I/O traces is harder than it seems • Currently, no solution is both portable and efficient � Replay is easier, but efficiency still matters � Many possibilities for future work—which matter most? • File name transformation • Parallel trace and replay • More portable instrumentation • How to monitor mmap’d I/O? Lawrence Livermore National Laboratory 14

Pianola: A script-based I/O benchmark John May PSDW08, 17 November - PowerPoint PPT Presentation

Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark John May PSDW08, 17 November 2008 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S.

Class Unity scripts Rotate cube script Counter + collision script Sound script

LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Presenter: Muhammad Usman Ghani Latin script is

Natural script writing with Guile The newest step on my path towards the perfect script writing

Andromeda: XSS Accurate and Scalable Security Attackers evil script Analysis of Web

An Introduction to Php for Web API Principle of server side script WEB Client WEB SERVER html

Medicaid Benchmark Options Analysis Stakeholder Advisory Committee July 23, 2012 Overview

The HPC Challenge Benchmark: The HPC Challenge Benchmark: A Candidate for Replacing A Candidate

Benchmark and comparison of real-time solutions based on embedded Linux Peter Feuerer August 8,

Script Sacred Heart Primary School Can of kids video presentation SAFETY SCRIPT Hannah

101 PRESENTATION SCRIPT Speaking Notes from Living Well Now: Practice this script at least 5 times

Pilot Training: Pilot Training: Departing From The Script Departing From The Script Captain

Presentation script Slide Screenshot Script SLIDE 1 DO: [Welcome leaders to the education

BBB AMBASSADOR PRESENTATION SCRIPT *You are not required to read the provided script verbatim.

What is Bash Shell Scripting? A shell script is a script written for the shell, or command

SCRIPT JOHN NEWBERY @jfnewbery github.com/jnewbery WHAT THIS TALK WILL COVER Why we have

Overview and Progress ICANN Singapore Meeting Task Force on Arabic Script IDNs (TF-AIDN) Middle

State Spaces & Partial-Order Planning AI Class 22 (Ch. 10 through 10.4.4 ) Material from Dr.

TDDD04: Integration and System level testing Lena Buffoni lena.buffoni@liu.se Lecture plan

Flow Bindings v03 draft-ietf-mext-flow-binding-03.txt George Tsirtsis Hesham(ed.), Nicolas,

Verifying fence elimination optimisations Viktor Vafeiadis, MPI-SWS Francesco Zappa Nardelli,

Programming Rules Appendix H Computer Security: Art and Science, 2 nd Edition Version 1.0 Slide

NOW Handout Page 1 Strawman Lock Atomic Instructions Specifies a location, register, &

George Palade CSE P 590 A Nov. 19, 1912 -- Oct 8, 2008 Autumn 2008 Lecture 5 Motifs:

Sustainable Use of Risk-Informed Regulation to Improve Plant Safety Nuclear Regulatory Commission

Pianola: A script-based I/O benchmark John May PSDW08, 17 November - PowerPoint PPT Presentation

Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark John May PSDW08, 17 November 2008 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S.

Class Unity scripts Rotate cube script Counter + collision script Sound script

LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Presenter: Muhammad Usman Ghani Latin script is

Natural script writing with Guile The newest step on my path towards the perfect script writing

Andromeda: XSS Accurate and Scalable Security Attackers evil script Analysis of Web

An Introduction to Php for Web API Principle of server side script WEB Client WEB SERVER html

Medicaid Benchmark Options Analysis Stakeholder Advisory Committee July 23, 2012 Overview

The HPC Challenge Benchmark: The HPC Challenge Benchmark: A Candidate for Replacing A Candidate

Benchmark and comparison of real-time solutions based on embedded Linux Peter Feuerer August 8,

Script Sacred Heart Primary School Can of kids video presentation SAFETY SCRIPT Hannah

101 PRESENTATION SCRIPT Speaking Notes from Living Well Now: Practice this script at least 5 times

Pilot Training: Pilot Training: Departing From The Script Departing From The Script Captain

Presentation script Slide Screenshot Script SLIDE 1 DO: [Welcome leaders to the education

BBB AMBASSADOR PRESENTATION SCRIPT *You are not required to read the provided script verbatim.

What is Bash Shell Scripting? A shell script is a script written for the shell, or command

SCRIPT JOHN NEWBERY @jfnewbery github.com/jnewbery WHAT THIS TALK WILL COVER Why we have

Overview and Progress ICANN Singapore Meeting Task Force on Arabic Script IDNs (TF-AIDN) Middle

State Spaces &amp; Partial-Order Planning AI Class 22 (Ch. 10 through 10.4.4 ) Material from Dr.

TDDD04: Integration and System level testing Lena Buffoni lena.buffoni@liu.se Lecture plan

Flow Bindings v03 draft-ietf-mext-flow-binding-03.txt George Tsirtsis Hesham(ed.), Nicolas,

Verifying fence elimination optimisations Viktor Vafeiadis, MPI-SWS Francesco Zappa Nardelli,

Programming Rules Appendix H Computer Security: Art and Science, 2 nd Edition Version 1.0 Slide

NOW Handout Page 1 Strawman Lock Atomic Instructions Specifies a location, register, &amp;

George Palade CSE P 590 A Nov. 19, 1912 -- Oct 8, 2008 Autumn 2008 Lecture 5 Motifs:

Sustainable Use of Risk-Informed Regulation to Improve Plant Safety Nuclear Regulatory Commission

State Spaces & Partial-Order Planning AI Class 22 (Ch. 10 through 10.4.4 ) Material from Dr.

NOW Handout Page 1 Strawman Lock Atomic Instructions Specifies a location, register, &