Performance, Power, Die Yield CS301 Prof Szajda Administrative - - PowerPoint PPT Presentation

performance power die yield
SMART_READER_LITE
LIVE PREVIEW

Performance, Power, Die Yield CS301 Prof Szajda Administrative - - PowerPoint PPT Presentation

Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the best performance? 4


slide-1
SLIDE 1

Performance, Power, Die Yield

CS301 Prof Szajda

slide-2
SLIDE 2

Administrative

  • HW #1 assigned

w Due Wednesday, 9/3 at 5:00 pm

slide-3
SLIDE 3

Performance Metrics

(How do we compare two machines?)

slide-4
SLIDE 4

What to Measure?

4

Which airplane has the best performance?

slide-5
SLIDE 5

Performance

  • One size does not fit all
  • Depends on application domain

w Scientific computing w Graphics w Databases w General-Purpose desktop w Beware of designing to benchmark!

  • Depends on technology characteristics

w DRAM speed and capacity, chip size, etc.

slide-6
SLIDE 6

Which Metric Do We Use?

  • Response or execution time

w Difgerence between start and end time w Individual user cares most about this

  • Throughput

w Total amount of work done in given time w Frequently used for servers and clusters

  • How are these afgected by

w Replacing processor with faster version? w Adding more processors?

slide-7
SLIDE 7

Execution Time

  • Shorter execution time is better
  • Allows comparison between 2

machines

slide-8
SLIDE 8

Relative Performance

  • “X is n times faster than Y”
  • Example:

w Machine A takes 10s to run program w Machine B takes 15s to run same program w What is the performance ratio?

slide-9
SLIDE 9

Difgerent Time Values

  • Execution time

w Wall-clock, response, or elapsed time § Includes everything (processing,I/O, OS overhead, etc)! w Determines system performance

  • CPU time

w Time spent executing code for this task only § Does not include I/O or time-sharing w Comprises user CPU time and system CPU time

§ Difgerence programs are afgected difgerently by CPU and system performance

w man time § 90.7u 12.9s 2:39 65% § User: 90.7 sec § System: 12.9 sec § Elapsed time: 2 min 39 sec

slide-10
SLIDE 10

Clock Cycles

  • Instead of expressing time in seconds, use

clock cycles

  • Clock

w Determines when events take place w Runs at constant rate (ex. 1 GHz) w Easy to convert between clock rate and seconds

§ Clock rate = 1 / Clock Cycle § 500 MHz = 1 / (2 ns) § 1 ns = 10-9 s

slide-11
SLIDE 11

Chapter 1 — Computer Abstractions and Technology —

CPU Clocking

n Operation of digital hardware governed by a

constant-rate clock

Clock (cycles) Data transfer
 and computation Update state Clock period

n Clock period: duration of a clock cycle

n e.g., 250ps = 0.25ns = 250×10–12s

n Clock frequency (rate): cycles per second

n e.g., 4.0GHz = 4000MHz = 4.0×109Hz

slide-12
SLIDE 12

Chapter 1 — Computer Abstractions and Technology —

CPU Time

n Performance improved by

n Reducing number of clock cycles n Increasing clock rate n Hardware designer must often trade off clock

rate against cycle count

slide-13
SLIDE 13

Chapter 1 — Computer Abstractions and Technology —

CPU Time Example

n Computer A: 2GHz clock, 10s CPU time n Designing Computer B

n Aim for 6s CPU time n Can do faster clock, but causes 1.2 × clock cycles

n How fast must Computer B clock be?

slide-14
SLIDE 14

Chapter 1 — Computer Abstractions and Technology —

Instruction Count and CPI

n Instruction Count for a program

n Determined by program, ISA and compiler

n Average cycles per instruction

n Determined by CPU hardware n If different instructions have different CPI

n Average CPI affected by instruction mix

slide-15
SLIDE 15

Chapter 1 — Computer Abstractions and Technology —

CPI Example

n Computer A: Cycle Time = 250ps, CPI = 2.0 n Computer B: Cycle Time = 500ps, CPI = 1.2 n Same ISA n Which is faster, and by how much?

A is faster… …by this much

slide-16
SLIDE 16

Application Characteristics

  • Determine the mix of difgerent

instruction types

w Integer arithmetic w Logical operations w Floating point arithmetic w Loads and stores

  • Difgerent applications have difgerent

CPI because of difgerent instruction mixes

slide-17
SLIDE 17

Chapter 1 — Computer Abstractions and Technology —

CPI in More Detail

n If different instruction classes take different

numbers of cycles

n Weighted average CPI

Relative frequency

slide-18
SLIDE 18

Chapter 1 — Computer Abstractions and Technology —

CPI Example

n Alternative compiled code sequences using

instructions in classes A, B, C

Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1

n Sequence 1: IC = 5

n Clock Cycles


= 2×1 + 1×2 + 2×3
 = 10

n Avg. CPI = 10/5 = 2.0

n Sequence 2: IC = 6

n Clock Cycles


= 4×1 + 1×2 + 1×3
 = 9

n Avg. CPI = 9/6 = 1.5

slide-19
SLIDE 19

Chapter 1 — Computer Abstractions and Technology —

Performance Summary

n Performance depends on

n Algorithm: affects IC, possibly CPI n Programming language: affects IC, CPI n Compiler: affects IC, CPI n Instruction set architecture: affects IC, CPI, Tc

The BIG Picture

slide-20
SLIDE 20

Amdahl’s Law

  • How much speedup do you get from an

enhancement?

  • Based on

w Fraction of time enhancement used w Improvement in enhanced mode

Speedup = Execution time w/o enhancement Execution time w/ enhancement Execnew = Execold × ((1-fractionenh) + Speedupenh fractionenh )

slide-21
SLIDE 21

Chapter 1 — Computer Abstractions and Technology —

Pitfall: Amdahl’s Law

n Improving an aspect of a computer and

expecting a proportional improvement in overall performance

§1.10 Fallacies and Pitfalls

n Can’t be done!

n Example: multiply accounts for 80s/100s

n How much improvement in multiply performance to

get 5× overall?

n Corollary: make the common case fast

slide-22
SLIDE 22

Review Question

  • Your machine has a clock rate of

2.4GHz. How long is the clock cycle?

slide-23
SLIDE 23

Review Questions

  • Suppose you are given the following:

w Machine A

§ 1 GHz § Average CPI = 1.6 § Instructions = 1.7 Billion

w Machine B

§ 3.3 GHz § Average CPI = 6.1 § Instructions = 2 Billion

  • Which machine is faster? By how

much?

slide-24
SLIDE 24

Review Questions

  • What is the average CPI for a machine

with the following CPIs on an application with the following instruction frequency?

Type Frequency CPI Arithme(c 0.45 1 Memory 0.3 8 Control 0.2 3 Mult/Div 0.05 5

slide-25
SLIDE 25

Review Questions

  • What factors must be included when

comparing the relative performance of two machines?

slide-26
SLIDE 26

Amdahl’s Law

  • Suppose you have an enhancement

that makes function 10x faster.

  • Speedup if used 5% of the time?
  • Speedup if used 40% of the time?

Execnew = Execold × ((1-fractionenh) + Speedupenh fractionenh )

slide-27
SLIDE 27

Review Questions

  • What is the equation for execution

time?

  • What does Amdahl’s Law say?
slide-28
SLIDE 28

Benchmarks

  • Programs specifically used to measure

performance

  • Hope is that it is representative of how

computer will be used

  • Examples

w SPEC Integer and Floating Point w MediaBench w MineBench w TPC

slide-29
SLIDE 29

Chapter 1 — Computer Abstractions and Technology —

SPEC CPU Benchmark

n Programs used to measure performance

n Supposedly typical of actual workload

n Standard Performance Evaluation Corp (SPEC)

n Develops benchmarks for CPU, I/O, Web, …

n SPEC CPU2006

n Elapsed time to execute a selection of programs

n Negligible I/O, so focuses on CPU performance

n Normalize relative to reference machine n Summarize as geometric mean of performance ratios

n CINT2006 (integer) and CFP2006 (floating-point)

slide-30
SLIDE 30

Chapter 1 — Computer Abstractions and Technology —

CINT2006 for Intel Core i7 920

slide-31
SLIDE 31

Chapter 1 — Computer Abstractions and Technology —

Recent Concern: Power Trends

n In CMOS IC technology

§1.7 The Power Wall ×1000 ×30 5V → 1V

slide-32
SLIDE 32

Tricks to Increase Power

  • Attach large cooling devices
  • Turn ofg parts of chips not used in

given clock cycle

w Can increase power to 300 watts... w ...But these and other ways all prohibitively expensive for desktop

  • computers. So...

32

slide-33
SLIDE 33

More Recent Approaches:
 Chip Multiprocessors

  • Reasons for change

w Limited opportunities to improve single thread performance w Power w On-chip communication latencies

slide-34
SLIDE 34

Tapering Processor Performance

slide-35
SLIDE 35

Chapter 1 — Computer Abstractions and Technology —

Uniprocessor Performance

§1.8 The Sea Change: The Switch to Multiprocessors

Constrained by power, instruction-level parallelism, memory latency

slide-36
SLIDE 36

Chapter 1 — Computer Abstractions and Technology —

Multiprocessors

n Multicore microprocessors

n More than one processor per chip

n Requires explicitly parallel programming

n Compare with instruction level parallelism

n Hardware executes multiple instructions at once n Hidden from the programmer

n Hard to do

n Programming for performance n Load balancing n Optimizing communication and synchronization

slide-37
SLIDE 37

Chapter 1 — Computer Abstractions and Technology —

Concluding Remarks

n Cost/performance is improving

n Due to underlying technology development

n Hierarchical layers of abstraction

n In both hardware and software

n Instruction set architecture

n The hardware/software interface

n Execution time: the best performance

measure

n Power is a limiting factor

n Use parallelism to improve performance

§1.9 Concluding Remarks