[PPT] - CS305 Computer Architecture Fall 2009 Lecture 04 Bhaskaran Raman PowerPoint Presentation

SLIDE 1

CS305 Computer Architecture Fall 2009 Lecture 04

Bhaskaran Raman Department of CSE, IIT Bombay

http://www.cse.iitb.ac.in/~br/ http://www.cse.iitb.ac.in/synerg/doku.php?id=public:courses:cs305-fall09:start

SLIDE 2

Today's Topics

Performance metrics, CPI
Performance comparison
Benchmarks

SLIDE 3

Performance Comparison

What performance metric to use?
User cares about response time
Performance is inversely proportional
What is execution time?
Response time
CPU time: User time + System time
System performance vs. CPU performance
Throughput vs. response-time
We will focus on CPU performance

SLIDE 4

Which Program's Execution Time?

Real “workload” is ideal
Practical options:
Real programs: compilers, office-suite, scientific...
Kernels: key pieces of programs

– Example: Livermore loops

Toy benchmarks: small programs

– Examples: Quick-sort, tower of Hanoi...

Synthetic benchmarks: try to capture “average”

frequency of instructions in real programs

– Example: Whetstone, Dhrystone

SLIDE 5

More on Performance Comparisons...

Caveat of benchmarks
They are needed
But manufacturers tend to optimize for benchmarks
Need to be updated periodically
Benchmark suite: collection of programs
E.g. SPEC2000
Reporting performance
Reproducibility: program version, compiler, flags
SPEC specifies compiler flags for baseline comparison

SLIDE 6

Some Numerics...

Total (or average) execution time is a possible

metric

Weighted execution time is better

Computer A Computer B Computer C Program P1 (secs) 1 10 20 Program P2 (secs) 1000 100 20 Total (secs) 1001 110 40

W i×T i

SLIDE 7

Normalizing the Performance

Normalize such that all programs take the same

time, on some machine

Arithmetic mean predicts performance
Geometric mean?

Norm(A)Norm(A)Norm(A)Norm(B)Norm(B)Norm(B)Norm(C)Norm(C)Norm(C) A B C A B C A B C P1 1 10 20 0.1 1 2 0.05 0.5 1 P2 1 0.1 0.02 10 1 0.2 50 5 1 AM 1 5.05 10.01 5.05 1 1.1 25.03 2.75 1 GM 1 1 0.63 1 1 0.63 1.58 1.58 1

SLIDE 8

Summary

Performance inversely proportional to execution-

time

We are concerned with CPU time of unloaded

machine

Weighted execution time with weights from real

workload is ideal

Else, normalize w.r.t one machine

SLIDE 9

Amdahl's Law

Amdahl's law:
Diminishing returns
Limit on overall speedup
Corollary: make the

common case fast

1-F F 1-F F/Speedup

SLIDE 10

Amdahl's Law

Amdahl's law:
Diminishing returns
Limit on overall speedup

1-F F 1-F F/Speedup

Corollary: make the

common case fast Overall speedup= 1−FF 1−F F Speedup

SLIDE 11

Illustrating Amdahl's Law

Example: implement faster memory, or faster ALU?
Proposed memory speedup: 10x
Proposed ALU speedup: 3x
Depends on fraction of instructions

– Suppose F mem=0.2,F alu=0.5,F other=0.3

Speedup with faster memory= 1 0.80.2/10=1.22 Speedup with faster ALU= 1 0.50.5/3=1.5

SLIDE 12

Example continued...

Fixing for what value of is

going for a faster memory better?

F alu=0.5

F mem

1 1−F memF mem/101.5 ⇒F mem10 27=0.36

SLIDE 13

The CPU Performance Equation

CPU time=Num.clock cycles×Clock cycletime CPU time=Num.of clock cycles÷Clock rate

OR

CPU time=IC×CPI×Cycletime

Putting these together Num.of clock cycles

=InstructionCount×Cycles Per Instruction

=IC×CPI

For a program,

SLIDE 14

More on the Equation

This form is convenient
Involves many relevant parameters
Remembering is easy

CPU time= Seconds Program = Seconds Clock cycle× Clock cycles Instruction ×Instructions Program

With CPI as the independent variable

CPI= CPU time Clock cycletime×IC

SLIDE 15

Other Convenient Forms of the Equation

Number of clock cycles can be counted as:

CPU clock cycles=∑

i=1 n

CPI i×ICi Hence ,CPU time=∑

i=1 n

CPI i×ICi×Clock cycletime

Calculating in terms of

CPI CPI i

CPI= CPU time Clock cycletime×IC=∑

i=1 n

CPI i× ICi IC 

SLIDE 16

Usefulness of the Equation

easier to measure than
Equivalently, is measured through
Equation includes relevant parameters such as the

cycle time

IC i

F i F i

IC i

SLIDE 17

Measuring the Parameters for the Equation

Clock cycle time:
Easy for existing architectures
Needs to be estimated in the design process
Instruction Count:
Requires a compiler
And, simulator/interpreter, or instrumentation code
CPI for each instruction type:
Easy for simple architectures
Pipelines, caches introduce complications
Need to simulate and measure average CPI

SLIDE 18

A Design Example

A design choice for conditional branch

instructions:

Choice 1: condition code is set by a compare

instruction, checked by the next (branch) instruction

– 20% instructions are branches, and another 20% are

compares

– 2 cycles per branch, 1 cycle for all others – Clock-rate is 25% faster

Choice 2: single instruction for compare and branch
Which choice is better?

SLIDE 19

CS305 Computer Architecture Fall 2009 Lecture 04

Bhaskaran Raman Department of CSE, IIT Bombay

Today's Topics

Performance Comparison

Which Program's Execution Time?

frequency of instructions in real programs

More on Performance Comparisons...

Some Numerics...

metric

Computer A Computer B Computer C Program P1 (secs) 1 10 20 Program P2 (secs) 1000 100 20 Total (secs) 1001 110 40

W i×T i

Normalizing the Performance

time, on some machine

Summary

time

machine

workload is ideal

Amdahl's Law

common case fast

1-F F 1-F F/Speedup

Amdahl's Law

1-F F 1-F F/Speedup

common case fast Overall speedup= 1−FF 1−F F Speedup

Illustrating Amdahl's Law

– Suppose F mem=0.2,F alu=0.5,F other=0.3

Speedup with faster memory= 1 0.80.2/10=1.22 Speedup with faster ALU= 1 0.50.5/3=1.5

Example continued...

going for a faster memory better?

F alu=0.5

F mem

1 1−F memF mem/101.5 ⇒F mem10 27=0.36

The CPU Performance Equation

CPU time=Num.clock cycles×Clock cycletime CPU time=Num.of clock cycles÷Clock rate

CPU time=IC×CPI×Cycletime

Putting these together Num.of clock cycles

=InstructionCount×Cycles Per Instruction

=IC×CPI

For a program,

More on the Equation

CPU time= Seconds Program = Seconds Clock cycle× Clock cycles Instruction ×Instructions Program

CPI= CPU time Clock cycletime×IC

Other Convenient Forms of the Equation

CPU clock cycles=∑

CPI i×ICi Hence ,CPU time=∑

CPI i×ICi×Clock cycletime

CPI CPI i

CPI= CPU time Clock cycletime×IC=∑

i=1 n

CPI i× ICi IC 

Usefulness of the Equation

cycle time

IC i

F i F i

IC i

Measuring the Parameters for the Equation

A Design Example

instructions:

instruction, checked by the next (branch) instruction

compares

Solution for Design Example

CPU time1= IC1×[0.8×10.2×2] 1.25×C = IC1 C × 1.2 1.25 CPU time2= IC1×[0.6×10.2×2] C = IC1 C