15-213 The course that gives CMU its Zip! Time Measurement Time - - PowerPoint PPT Presentation

15 213
SMART_READER_LITE
LIVE PREVIEW

15-213 The course that gives CMU its Zip! Time Measurement Time - - PowerPoint PPT Presentation

15-213 The course that gives CMU its Zip! Time Measurement Time Measurement Oct. 24, 2002 Oct. 24, 2002 Topics Topics n Time scales n Interval counting n Cycle counters n K-best measurement scheme class18.ppt Computer Time Scales


slide-1
SLIDE 1

Time Measurement

  • Oct. 24, 2002

Time Measurement

  • Oct. 24, 2002

Topics Topics

n Time scales n Interval counting n Cycle counters n K-best measurement scheme

class18.ppt

15-213

“The course that gives CMU its Zip!”

slide-2
SLIDE 2

– 2 – 15-213, F’02

Computer Time Scales Computer Time Scales

Two Fundamental Time Scales Two Fundamental Time Scales

n Processor:

~10–9 sec.

n External events: ~10–2 sec. l Keyboard input l Disk seek l Screen refresh

Implication Implication

n Can execute many

instructions while waiting for external event to occur

n Can alternate among

processes without anyone noticing

Time Scale (1 Ghz Machine)

1.E-09 1.E-06 1.E-03 1.E+00

Time (seconds)

1 ns 1 µs 1 ms 1 s Integer Add FP Multiply FP Divide Keystroke Interrupt Handler Disk Access Screen Refresh Keystroke Microscopic Macroscopic

slide-3
SLIDE 3

– 3 – 15-213, F’02

Measurement Challenge Measurement Challenge

How Much Time Does Program X Require? How Much Time Does Program X Require?

n CPU time

l How many total seconds are used when executing X? l Measure used for most applications l Small dependence on other system activities

n Actual (“Wall”) Time

l How many seconds elapse between the start and the

completion of X?

l Depends on system load, I/O times, etc.

Confounding Factors Confounding Factors

n How does time get measured? n Many processes share computing resources

l Transient effects when switching from one process to another l Suddenly, the effects of alternating among processes become

noticeable

slide-4
SLIDE 4

– 4 – 15-213, F’02

“Time” on a Computer System “Time” on a Computer System

real (wall clock) time = user time (time executing instructions in the user process) + = real (wall clock) time We will use the word “time” to refer to user time. = system time (time executing instructions in kernel on behalf

  • f user process)

+ = some other user’s time (time executing instructions in different user’s process) cumulative user time

slide-5
SLIDE 5

– 5 – 15-213, F’02

Activity Periods: Light Load Activity Periods: Light Load

n Most of the time spent

executing one process

n Periodic interrupts every

10ms

l Interval timer l Keep system from

executing one process to exclusion of others

n Other interrupts l Due to I/O activity n Inactivity periods l System time spent

processing interrupts

l ~250,000 clock cycles Activity Periods, Load = 1 10 20 30 40 50 60 70 80

1

Time (ms) Active Inactive

slide-6
SLIDE 6

– 6 – 15-213, F’02

Activity Periods: Heavy Load Activity Periods: Heavy Load

n Sharing processor with one other active process n From perspective of this process, system appears to be

“inactive” for ~50% of the time

l Other process is executing

Activity Periods, Load = 2 10 20 30 40 50 60 70 80

1

Time (ms) Active Inactive

slide-7
SLIDE 7

– 7 – 15-213, F’02

Interval Counting Interval Counting

OS Measures Runtimes Using Interval Timer OS Measures Runtimes Using Interval Timer

n Maintain 2 counts per process

l User time l System time

n Each time get timer interrupt, increment counter for

executing process

l User time if running in user mode l System time if running in kernel mode

slide-8
SLIDE 8

– 8 – 15-213, F’02

Interval Counting Example Interval Counting Example

Au Au Au As Bu Bs Bu Bu Bu Bu As Au Au Au Au Au Bs Bu Bu Bs Au Au Au As As

A 110u + 40s B 70u + 30s

(a) Interval Timings B B A A A (b) Actual Times B A A B

A 120.0u + 33.3s B 73.3u + 23.3s

10 20 30 40 50 60 70 80 90 100110120130140150160

A

Au Au Au As Bu Bs Bu Bu Bu Bu As Au Au Au Au Au Bs Bu Bu Bs Au Au Au As As

A 110u + 40s B 70u + 30s

(a) Interval Timings B B A A A

Au Au Au As Bu Bs Bu Bu Bu Bu As Au Au Au Au Au Bs Bu Bu Bs Au Au Au As As

A 110u + 40s B 70u + 30s

(a) Interval Timings B B A A A (b) Actual Times B A A B

A 120.0u + 33.3s B 73.3u + 23.3s

10 20 30 40 50 60 70 80 90 100110120130140150160

A (b) Actual Times B A A B

A 120.0u + 33.3s B 73.3u + 23.3s

10 20 30 40 50 60 70 80 90 100110120130140150160

A

slide-9
SLIDE 9

– 9 – 15-213, F’02

Unix time Command Unix time Command

n 0.82 seconds user time

l 82 timer intervals

n 0.30 seconds system time

l 30 timer intervals

n 1.32 seconds wall time n 84.8% of total was used running these processes

l (.82+0.3)/1.32 = .848 time make osevent gcc -O2 -Wall -g -march=i486 -c clock.c gcc -O2 -Wall -g -march=i486 -c options.c gcc -O2 -Wall -g -march=i486 -c load.c gcc -O2 -Wall -g -march=i486 -o osevent osevent.c . . . 0.820u 0.300s 0:01.32 84.8% 0+0k 0+0io 4049pf+0w

slide-10
SLIDE 10

– 10 – 15-213, F’02

Accuracy of Interval Counting Accuracy of Interval Counting

Worst Case Analysis Worst Case Analysis

n Timer Interval = δ n Single process segment measurement can be off by ±δ n No bound on error for multiple segments

l Could consistently underestimate, or consistently overestimate

10 20 30 40 50 60 70 80

A A

Minimum Maximum

10 20 30 40 50 60 70 80

A A

Minimum Maximum

  • Computed time = 70ms
  • Min Actual = 60 + ε
  • Max Actual = 80 – ε
slide-11
SLIDE 11

– 11 – 15-213, F’02

Accuracy of Int. Cntg. (cont.) Accuracy of Int. Cntg. (cont.)

Average Case Analysis Average Case Analysis

n Over/underestimates tend to balance out n As long as total run time is sufficiently large

l Min run time ~1 second l 100 timer intervals

n Consistently miss 4% overhead due to timer interrupts 10 20 30 40 50 60 70 80

A A

Minimum Maximum

10 20 30 40 50 60 70 80

A A

Minimum Maximum

  • Computed time = 70ms
  • Min Actual = 60 + ε
  • Max Actual = 80 – ε
slide-12
SLIDE 12

– 12 – 15-213, F’02

Cycle Counters Cycle Counters

n Most modern systems have built in registers that are

incremented every clock cycle

l Very fine grained l Maintained as part of process state

» In Linux, counts elapsed global time

n Special assembly code instruction to access n On (recent model) Intel machines:

l 64 bit counter. l RDTSC instruction sets %edx to high order 32-bits, %eax to low

  • rder 32-bits
slide-13
SLIDE 13

– 13 – 15-213, F’02

Cycle Counter Period Cycle Counter Period

Wrap Around Times for 550 MHz machine Wrap Around Times for 550 MHz machine

n Low order 32 bits wrap around every 232 / (550 * 106) = 7.8

seconds

n High order 64 bits wrap around every 264 / (550 * 106) =

33539534679 seconds

l 1065 years

For 2 GHz machine For 2 GHz machine

n Low order 32-bits every 2.1 seconds n High order 64 bits every 293 years

slide-14
SLIDE 14

– 14 – 15-213, F’02

Measuring with Cycle Counter Measuring with Cycle Counter

Idea Idea

n Get current value of cycle counter

l store as pair of unsigned’s cyc_hi and cyc_lo

n Compute something n Get new value of cycle counter n Perform double precision subtraction to get elapsed cycles

/* Keep track of most recent reading of cycle counter */ static unsigned cyc_hi = 0; static unsigned cyc_lo = 0; void start_counter() { /* Get current value of cycle counter */ access_counter(&cyc_hi, &cyc_lo); }

slide-15
SLIDE 15

– 15 – 15-213, F’02

Accessing the Cycle Cntr. Accessing the Cycle Cntr.

n GCC allows inline assembly code with mechanism for

matching registers with program variables

n Code only works on x86 machine compiling with GCC n Emit assembly with rdtsc and two movl instructions

void access_counter(unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); }

slide-16
SLIDE 16

– 16 – 15-213, F’02

Closer Look at Extended ASM Closer Look at Extended ASM

Instruction String Instruction String

n Series of assembly commands

l Separated by “;” or “\n” l Use “%%” where normally would use “%” asm(ÒInstruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); }

slide-17
SLIDE 17

– 17 – 15-213, F’02

Closer Look at Extended ASM Closer Look at Extended ASM

Output List Output List

n Expressions indicating destinations for values %0, %1, …, %j

l Enclosed in parentheses l Must be lvalue

» Value that can appear on LHS of assignment

n Tag "=r" indicates that symbolic value (%0, etc.), should be

replaced by register

asm(ÒInstruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); }

slide-18
SLIDE 18

– 18 – 15-213, F’02

Closer Look at Extended ASM Closer Look at Extended ASM

Input List Input List

n Series of expressions indicating sources for values %j+1, %j+2,

l Enclosed in parentheses l Any expression returning value

n Tag "r" indicates that symbolic value (%0, etc.) will come from

register

asm(ÒInstruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); }

slide-19
SLIDE 19

– 19 – 15-213, F’02

Closer Look at Extended ASM Closer Look at Extended ASM

Clobbers List Clobbers List

n List of register names that get altered by assembly instruction n Compiler will make sure doesn’t store something in one of these

registers that must be preserved across asm

l Value set before & used after asm(ÒInstruction String" : Output List : Input List : Clobbers List); } void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); }

slide-20
SLIDE 20

– 20 – 15-213, F’02

Accessing the Cycle Cntr. (cont.) Accessing the Cycle Cntr. (cont.)

Emitted Assembly Code Emitted Assembly Code

n Used %ecx for *hi (replacing %0) n Used %ebx for *lo (replacing %1) n Does not use %eax or %edx for value that must be carried

across inserted assembly code

movl 8(%ebp),%esi # hi movl 12(%ebp),%edi # lo #APP rdtsc; movl %edx,%ecx; movl %eax,%ebx #NO_APP movl %ecx,(%esi) # Store high bits at *hi movl %ebx,(%edi) # Store low bits at *lo

slide-21
SLIDE 21

– 21 – 15-213, F’02

Completing Measurement Completing Measurement

n Get new value of cycle counter n Perform double precision subtraction to get elapsed cycles n Express as double to avoid overflow problems

double get_counter() { unsigned ncyc_hi, ncyc_lo unsigned hi, lo, borrow; /* Get cycle counter */ access_counter(&ncyc_hi, &ncyc_lo); /* Do double precision subtraction */ lo = ncyc_lo - cyc_lo; borrow = lo > ncyc_lo; hi = ncyc_hi - cyc_hi - borrow; return (double) hi * (1 << 30) * 4 + lo; }

slide-22
SLIDE 22

– 22 – 15-213, F’02

Timing With Cycle Counter Timing With Cycle Counter

Determine Clock Rate of Processor Determine Clock Rate of Processor

n Count number of cycles required for some fixed number of

seconds

Time Function P Time Function P

n First attempt: Simply count cycles for one execution of P

double tsecs; start_counter(); P(); tsecs = get_counter() / (MHZ * 1e6); double MHZ; int sleep_time = 10; start_counter(); sleep(sleep_time); MHZ = get_counter()/(sleep_time * 1e6);

slide-23
SLIDE 23

– 23 – 15-213, F’02

Measurement Pitfalls Measurement Pitfalls

Overhead Overhead

n Calling get_counter() incurs small amount of overhead n Want to measure long enough code sequence to

compensate

Unexpected Cache Effects Unexpected Cache Effects

n artificial hits or misses n e.g., these measurements were taken with the Alpha cycle

counter: foo1(array1, array2, array3); /* 68,829 cycles */ foo2(array1, array2, array3); /* 23,337 cycles */ vs. foo2(array1, array2, array3); /* 70,513 cycles */ foo1(array1, array2, array3); /* 23,203 cycles */

slide-24
SLIDE 24

– 24 – 15-213, F’02

Dealing with Overhead & Cache Effects Dealing with Overhead & Cache Effects

n Always execute function once to “warm up” cache n Keep doubling number of times execute P() until reach some

threshold

l Used CMIN = 50000 int cnt = 1; double cmeas = 0; double cycles; do { int c = cnt; P(); /* Warm up cache */ get_counter(); while (c-- > 0) P(); cmeas = get_counter(); cycles = cmeas / cnt; cnt += cnt; } while (cmeas < CMIN); /* Make sure have enough */ return cycles / (1e6 * MHZ);

slide-25
SLIDE 25

– 25 – 15-213, F’02

Multitasking Effects Multitasking Effects

Cycle Counter Measures Elapsed Time Cycle Counter Measures Elapsed Time

n Keeps accumulating during periods of inactivity

l System activity l Running other processes

Key Observation Key Observation

n Cycle counter never underestimates program run time n Possibly overestimates by large amount

K-Best Measurement Scheme K-Best Measurement Scheme

n Perform up to N (e.g., 20) measurements of function n See if fastest K (e.g., 3) within some relative factor ε (e.g., 0.001)

K

slide-26
SLIDE 26

– 26 – 15-213, F’02

K-Best Validation K-Best Validation

Very good accuracy for < 8ms Very good accuracy for < 8ms

n Within one timer interval n Even when heavily loaded

Less accurate of > 10ms Less accurate of > 10ms

n Light load: ~4% error l Interval clock interrupt

handling

n Heavy load: Very high error

Intel Pentium III, Linux

0.001 0.01 0.1 1 10 100 10 20 30 40 50 Expected CPU Time (ms) Measured:Expected Error Load 1 Load 2 Load 11

K = 3, ε = 0.001

slide-27
SLIDE 27

– 27 – 15-213, F’02

Compensate For Timer Overhead Compensate For Timer Overhead

Subtract Timer Overhead Subtract Timer Overhead

n Estimate overhead of single

interrupt by measuring periods

  • f inactivity

n Call interval timer to determine

number of interrupts that have

  • ccurred

Better Accuracy for > 10ms Better Accuracy for > 10ms

n Light load: 0.2% error n Heavy load: Still very high

error K = 3, ε = 0.001

Intel Pentium III, Linux Compensate for Timer Interrupt Handling

0.001 0.01 0.1 1 10 100 10 20 30 40 50 Expected CPU Time (ms) Measured:Expected Error Load 1 Load 2 Load 11

slide-28
SLIDE 28

– 28 – 15-213, F’02

K-Best

  • n NT

K-Best

  • n NT

Acceptable accuracy for < 50ms Acceptable accuracy for < 50ms

n Scheduler allows process to

run multiple intervals

Less accurate of > 10ms Less accurate of > 10ms

n Light load: 2% error n Heavy load: Generally very

high error K = 3, ε = 0.001

Pentium II, Windows-NT

0.001 0.01 0.1 1 10 100 50 100 150 200 250 300 Expected CPU Time (ms) Measured:Expected Error Load 1 Load 2 Load 11

slide-29
SLIDE 29

– 29 – 15-213, F’02

Time of Day Clock Time of Day Clock

n Unix gettimeofday() function n Return elapsed time since reference time (Jan 1, 1970) n Implementation

l Uses interval counting on some machines

» Coarse grained

l Uses cycle counter on others

» Fine grained, but significant overhead and only 1 microsecond resolution

#include <sys/time.h> #include <unistd.h> struct timeval tstart, tfinish; double tsecs; gettimeofday(&tstart, NULL); P(); gettimeofday(&tfinish, NULL); tsecs = (tfinish.tv_sec - tstart.tv_sec) + 1e6 * (tfinish.tv_usec - tstart.tv_usec);

slide-30
SLIDE 30

– 30 – 15-213, F’02

K-Best Using gettimeofday K-Best Using gettimeofday

Linux Linux

n As good as using cycle

counter

n For times > 10 microseconds

Windows Windows

n Implemented by interval

counting

n Too coarse-grained

Using gettimeofday

  • 0.5
  • 0.4
  • 0.3
  • 0.2
  • 0.1

0.1 0.2 0.3 0.4 0.5 50 100 150 200 250 300 Expected CPU Time (ms) Measured:Expected Error Win-NT Linux Linux-comp

slide-31
SLIDE 31

– 31 – 15-213, F’02

Measurement Summary Measurement Summary

Timing is highly case and system dependent Timing is highly case and system dependent

n What is overall duration being measured?

l > 1 second: interval counting is OK l << 1 second: must use cycle counters

n On what hardware / OS / OS version?

l Accessing counters

» How gettimeofday is implemented

l Timer interrupt overhead l Scheduling policy

Devising a Measurement Method Devising a Measurement Method

n Long durations: use Unix timing functions n Short durations

l If possible, use gettimeofday l Otherwise must work with cycle counters l K-best scheme most successful