CPU Performance Lecture 8 CAP 3103 06-11-2014 1.6 Performance - - PowerPoint PPT Presentation

cpu performance
SMART_READER_LITE
LIVE PREVIEW

CPU Performance Lecture 8 CAP 3103 06-11-2014 1.6 Performance - - PowerPoint PPT Presentation

CPU Performance Lecture 8 CAP 3103 06-11-2014 1.6 Performance Defining Performance Which airplane has the best performance? Boeing 777 Boeing 777 Boeing 747 Boeing 747 BAC/Sud BAC/Sud Concorde Concorde Douglas Douglas DC-


slide-1
SLIDE 1

CPU Performance

Lecture 8 CAP 3103 06-11-2014

slide-2
SLIDE 2

Chapter 1 — Computer Abstractions and Technology — 2

Defining Performance

 Which airplane has the best performance?

100 200 300 400 500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Passenger Capacity 2000 4000 6000 8000 10000 Douglas DC- 8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Range (miles) 500 1000 1500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Speed (mph) 100000 200000 300000 400000 Douglas DC- 8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Passengers x mph

§1.6 Performance

slide-3
SLIDE 3

Chapter 1 — Computer Abstractions and Technology — 3

Response Time and Throughput

 Response time

 How long it takes to do a task

 Throughput

 Total work done per unit time

 e.g., tasks/transactions/… per hour

 How are response time and throughput affected

by

 Replacing the processor with a faster version?  Adding more processors?

 We’ll focus on response time for now…

slide-4
SLIDE 4

Chapter 1 — Computer Abstractions and Technology — 4

Relative Performance

 Define Performance = 1/Execution Time  “X is n time faster than Y”

n  

X Y Y X

time Execution time Execution e Performanc e Performanc

 Example: time taken to run a program

 10s on A, 15s on B  Execution TimeB / Execution TimeA

= 15s / 10s = 1.5

 So A is 1.5 times faster than B

slide-5
SLIDE 5

Chapter 1 — Computer Abstractions and Technology — 5

Measuring Execution Time

 Elapsed time

 Total response time, including all aspects

 Processing, I/O, OS overhead, idle time

 Determines system performance

 CPU time

 Time spent processing a given job

 Discounts I/O time, other jobs’ shares

 Comprises user CPU time and system CPU

time

 Different programs are affected differently by

CPU and system performance

slide-6
SLIDE 6

Chapter 1 — Computer Abstractions and Technology — 6

CPU Clocking

 Operation of digital hardware governed by a

constant-rate clock

Clock (cycles) Data transfer and computation Update state Clock period

 Clock period: duration of a clock cycle

 e.g., 250ps = 0.25ns = 250×10–12s

 Clock frequency (rate): cycles per second

 e.g., 4.0GHz = 4000MHz = 4.0×109Hz

slide-7
SLIDE 7

Chapter 1 — Computer Abstractions and Technology — 7

CPU Time

 Performance improved by

 Reducing number of clock cycles  Increasing clock rate  Hardware designer must often trade off clock

rate against cycle count

Rate Clock Cycles Clock CPU Time Cycle Clock Cycles Clock CPU Time CPU   

slide-8
SLIDE 8

Chapter 1 — Computer Abstractions and Technology — 8

CPU Time Example

 Computer A: 2GHz clock, 10s CPU time  Designing Computer B

 Aim for 6s CPU time  Can do faster clock, but causes 1.2 × clock cycles

 How fast must Computer B clock be?

slide-9
SLIDE 9

CPU Time Example

 Computer A: 2GHz clock, 10s CPU time  Designing Computer B

 Aim for 6s CPU time  Can do faster clock, but causes 1.2 × clock cycles

 How fast must Computer B clock be? Chapter 1 — Computer Abstractions and Technology — 9

4GHz 6s 10 24 6s 10 20 1.2 Rate Clock 10 20 2GHz 10s Rate Clock Time CPU Cycles Clock 6s Cycles Clock 1.2 Time CPU Cycles Clock Rate Clock

9 9 B 9 A A A A B B B

              

slide-10
SLIDE 10

Chapter 1 — Computer Abstractions and Technology — 10

Instruction Count and CPI

 Instruction Count for a program

 Determined by program, ISA and compiler

 Average cycles per instruction

 Determined by CPU hardware  If different instructions have different CPI

 Average CPI affected by instruction mix

Rate Clock CPI Count n Instructio Time Cycle Clock CPI Count n Instructio Time CPU n Instructio per Cycles Count n Instructio Cycles Clock       

slide-11
SLIDE 11

Chapter 1 — Computer Abstractions and Technology — 11

CPI Example

 Computer A: Cycle Time = 250ps, CPI = 2.0  Computer B: Cycle Time = 500ps, CPI = 1.2  Same ISA  Which is faster, and by how much?

slide-12
SLIDE 12

Chapter 1 — Computer Abstractions and Technology — 12

CPI Example

 Computer A: Cycle Time = 250ps, CPI = 2.0  Computer B: Cycle Time = 500ps, CPI = 1.2  Same ISA  Which is faster, and by how much?

1.2 500ps I 600ps I A Time CPU B Time CPU 600ps I 500ps 1.2 I B Time Cycle B CPI Count n Instructio B Time CPU 500ps I 250ps 2.0 I A Time Cycle A CPI Count n Instructio A Time CPU                    

A is faster… …by this much

slide-13
SLIDE 13

Chapter 1 — Computer Abstractions and Technology — 13

CPI in More Detail

 If different instruction classes take different

numbers of cycles

 

n 1 i i i

) Count n Instructio (CPI Cycles Clock

 Weighted average CPI

        

n 1 i i i

Count n Instructio Count n Instructio CPI Count n Instructio Cycles Clock CPI

Relative frequency

slide-14
SLIDE 14

Chapter 1 — Computer Abstractions and Technology — 14

CPI Example

 Alternative compiled code sequences using

instructions in classes A, B, C

Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1

 Which code sequence executes the most

instructions? Which one will be faster? What is the CPI for each sequence?

slide-15
SLIDE 15

Chapter 1 — Computer Abstractions and Technology — 15

CPI Example

 Alternative compiled code sequences using

instructions in classes A, B, C

Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1

 Sequence 1: IC = 5

 Clock Cycles

= 2×1 + 1×2 + 2×3 = 10

 Avg. CPI = 10/5 = 2.0

 Sequence 2: IC = 6

 Clock Cycles

= 4×1 + 1×2 + 1×3 = 9

 Avg. CPI = 9/6 = 1.5

slide-16
SLIDE 16

Chapter 1 — Computer Abstractions and Technology — 16

Performance Summary

 Performance depends on

 Algorithm: affects IC, possibly CPI  Programming language: affects IC, CPI  Compiler: affects IC, CPI  Instruction set architecture: affects IC, CPI, Tc

The he BIG BIG P Pictur icture

cycle Clock Seconds n Instructio cycles Clock Program ns Instructio Time CPU   

slide-17
SLIDE 17

Chapter 1 — Computer Abstractions and Technology — 17

Power Trends

 In CMOS IC technology

§1.7 The Power Wall

Frequency Voltage load Capacitive Power

2 

 

×1000 ×30 5V → 1V

slide-18
SLIDE 18

Reducing Power

 Suppose we developed a new, simpler

processor that has 85% of the capacitive load of the more complex older processor. Further, assume that it has adjustable voltage so that it can reduce voltage 15% compared to processor B, which results in a 15% shrink in frequency.

 What is the impact on dynamic power?

Chapter 1 — Computer Abstractions and Technology — 18

slide-19
SLIDE 19

Chapter 1 — Computer Abstractions and Technology — 19

Reducing Power

 Suppose a new CPU has

 85% of capacitive load of old CPU  15% voltage and 15% frequency reduction

0.52 0.85 F V C 0.85 F 0.85) (V 0.85 C P P

4

  • ld

2

  • ld
  • ld
  • ld

2

  • ld
  • ld
  • ld

new

         

 The power wall

 We can’t reduce voltage further  We can’t remove more heat

 How else can we improve performance?