Performance, Power, Die Yield CS301 Prof Szajda Administrative - - PowerPoint PPT Presentation
Performance, Power, Die Yield CS301 Prof Szajda Administrative - - PowerPoint PPT Presentation
Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the best performance? 4
Administrative
- HW #1 assigned
w Due Wednesday, 9/3 at 5:00 pm
Performance Metrics
(How do we compare two machines?)
What to Measure?
4
Which airplane has the best performance?
Performance
- One size does not fit all
- Depends on application domain
w Scientific computing w Graphics w Databases w General-Purpose desktop w Beware of designing to benchmark!
- Depends on technology characteristics
w DRAM speed and capacity, chip size, etc.
Which Metric Do We Use?
- Response or execution time
w Difgerence between start and end time w Individual user cares most about this
- Throughput
w Total amount of work done in given time w Frequently used for servers and clusters
- How are these afgected by
w Replacing processor with faster version? w Adding more processors?
Execution Time
- Shorter execution time is better
- Allows comparison between 2
machines
Relative Performance
- “X is n times faster than Y”
- Example:
w Machine A takes 10s to run program w Machine B takes 15s to run same program w What is the performance ratio?
Difgerent Time Values
- Execution time
w Wall-clock, response, or elapsed time § Includes everything (processing,I/O, OS overhead, etc)! w Determines system performance
- CPU time
w Time spent executing code for this task only § Does not include I/O or time-sharing w Comprises user CPU time and system CPU time
§ Difgerence programs are afgected difgerently by CPU and system performance
w man time § 90.7u 12.9s 2:39 65% § User: 90.7 sec § System: 12.9 sec § Elapsed time: 2 min 39 sec
Clock Cycles
- Instead of expressing time in seconds, use
clock cycles
- Clock
w Determines when events take place w Runs at constant rate (ex. 1 GHz) w Easy to convert between clock rate and seconds
§ Clock rate = 1 / Clock Cycle § 500 MHz = 1 / (2 ns) § 1 ns = 10-9 s
Chapter 1 — Computer Abstractions and Technology —
CPU Clocking
n Operation of digital hardware governed by a
constant-rate clock
Clock (cycles) Data transfer and computation Update state Clock period
n Clock period: duration of a clock cycle
n e.g., 250ps = 0.25ns = 250×10–12s
n Clock frequency (rate): cycles per second
n e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Chapter 1 — Computer Abstractions and Technology —
CPU Time
n Performance improved by
n Reducing number of clock cycles n Increasing clock rate n Hardware designer must often trade off clock
rate against cycle count
Chapter 1 — Computer Abstractions and Technology —
CPU Time Example
n Computer A: 2GHz clock, 10s CPU time n Designing Computer B
n Aim for 6s CPU time n Can do faster clock, but causes 1.2 × clock cycles
n How fast must Computer B clock be?
Chapter 1 — Computer Abstractions and Technology —
Instruction Count and CPI
n Instruction Count for a program
n Determined by program, ISA and compiler
n Average cycles per instruction
n Determined by CPU hardware n If different instructions have different CPI
n Average CPI affected by instruction mix
Chapter 1 — Computer Abstractions and Technology —
CPI Example
n Computer A: Cycle Time = 250ps, CPI = 2.0 n Computer B: Cycle Time = 500ps, CPI = 1.2 n Same ISA n Which is faster, and by how much?
A is faster… …by this much
Application Characteristics
- Determine the mix of difgerent
instruction types
w Integer arithmetic w Logical operations w Floating point arithmetic w Loads and stores
- Difgerent applications have difgerent
CPI because of difgerent instruction mixes
Chapter 1 — Computer Abstractions and Technology —
CPI in More Detail
n If different instruction classes take different
numbers of cycles
n Weighted average CPI
Relative frequency
Chapter 1 — Computer Abstractions and Technology —
CPI Example
n Alternative compiled code sequences using
instructions in classes A, B, C
Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1
n Sequence 1: IC = 5
n Clock Cycles
= 2×1 + 1×2 + 2×3 = 10
n Avg. CPI = 10/5 = 2.0
n Sequence 2: IC = 6
n Clock Cycles
= 4×1 + 1×2 + 1×3 = 9
n Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology —
Performance Summary
n Performance depends on
n Algorithm: affects IC, possibly CPI n Programming language: affects IC, CPI n Compiler: affects IC, CPI n Instruction set architecture: affects IC, CPI, Tc
The BIG Picture
Amdahl’s Law
- How much speedup do you get from an
enhancement?
- Based on
w Fraction of time enhancement used w Improvement in enhanced mode
Speedup = Execution time w/o enhancement Execution time w/ enhancement Execnew = Execold × ((1-fractionenh) + Speedupenh fractionenh )
Chapter 1 — Computer Abstractions and Technology —
Pitfall: Amdahl’s Law
n Improving an aspect of a computer and
expecting a proportional improvement in overall performance
§1.10 Fallacies and Pitfalls
n Can’t be done!
n Example: multiply accounts for 80s/100s
n How much improvement in multiply performance to
get 5× overall?
n Corollary: make the common case fast
Review Question
- Your machine has a clock rate of
2.4GHz. How long is the clock cycle?
Review Questions
- Suppose you are given the following:
w Machine A
§ 1 GHz § Average CPI = 1.6 § Instructions = 1.7 Billion
w Machine B
§ 3.3 GHz § Average CPI = 6.1 § Instructions = 2 Billion
- Which machine is faster? By how
much?
Review Questions
- What is the average CPI for a machine
with the following CPIs on an application with the following instruction frequency?
Type Frequency CPI Arithme(c 0.45 1 Memory 0.3 8 Control 0.2 3 Mult/Div 0.05 5
Review Questions
- What factors must be included when
comparing the relative performance of two machines?
Amdahl’s Law
- Suppose you have an enhancement
that makes function 10x faster.
- Speedup if used 5% of the time?
- Speedup if used 40% of the time?
Execnew = Execold × ((1-fractionenh) + Speedupenh fractionenh )
Review Questions
- What is the equation for execution
time?
- What does Amdahl’s Law say?
Benchmarks
- Programs specifically used to measure
performance
- Hope is that it is representative of how
computer will be used
- Examples
w SPEC Integer and Floating Point w MediaBench w MineBench w TPC
Chapter 1 — Computer Abstractions and Technology —
SPEC CPU Benchmark
n Programs used to measure performance
n Supposedly typical of actual workload
n Standard Performance Evaluation Corp (SPEC)
n Develops benchmarks for CPU, I/O, Web, …
n SPEC CPU2006
n Elapsed time to execute a selection of programs
n Negligible I/O, so focuses on CPU performance
n Normalize relative to reference machine n Summarize as geometric mean of performance ratios
n CINT2006 (integer) and CFP2006 (floating-point)
Chapter 1 — Computer Abstractions and Technology —
CINT2006 for Intel Core i7 920
Chapter 1 — Computer Abstractions and Technology —
Recent Concern: Power Trends
n In CMOS IC technology
§1.7 The Power Wall ×1000 ×30 5V → 1V
Tricks to Increase Power
- Attach large cooling devices
- Turn ofg parts of chips not used in
given clock cycle
w Can increase power to 300 watts... w ...But these and other ways all prohibitively expensive for desktop
- computers. So...
32
More Recent Approaches: Chip Multiprocessors
- Reasons for change
w Limited opportunities to improve single thread performance w Power w On-chip communication latencies
Tapering Processor Performance
Chapter 1 — Computer Abstractions and Technology —
Uniprocessor Performance
§1.8 The Sea Change: The Switch to Multiprocessors
Constrained by power, instruction-level parallelism, memory latency
Chapter 1 — Computer Abstractions and Technology —
Multiprocessors
n Multicore microprocessors
n More than one processor per chip
n Requires explicitly parallel programming
n Compare with instruction level parallelism
n Hardware executes multiple instructions at once n Hidden from the programmer
n Hard to do
n Programming for performance n Load balancing n Optimizing communication and synchronization
Chapter 1 — Computer Abstractions and Technology —
Concluding Remarks
n Cost/performance is improving
n Due to underlying technology development
n Hierarchical layers of abstraction
n In both hardware and software
n Instruction set architecture
n The hardware/software interface
n Execution time: the best performance
measure
n Power is a limiting factor
n Use parallelism to improve performance
§1.9 Concluding Remarks