CSEE 3827: Fundamentals of Computer Systems Lecture 18, 19, & 20 - - PowerPoint PPT Presentation

csee 3827 fundamentals of computer systems
SMART_READER_LITE
LIVE PREVIEW

CSEE 3827: Fundamentals of Computer Systems Lecture 18, 19, & 20 - - PowerPoint PPT Presentation

CSEE 3827: Fundamentals of Computer Systems Lecture 18, 19, & 20 April 2009 Martha Kim martha@cs.columbia.edu Outline We will examine two MIPS implementations A single-cycle version A pipelined version Simple subset of


slide-1
SLIDE 1

CSEE 3827: Fundamentals of Computer Systems

Lecture 18, 19, & 20 April 2009 Martha Kim martha@cs.columbia.edu

slide-2
SLIDE 2

CSEE 3827, Spring 2009 Martha Kim

Outline

  • We will examine two MIPS implementations
  • A single-cycle version
  • A pipelined version
  • Simple subset of MIPS, showing most aspects
  • Memory reference: lw, sw
  • Arithmetic/logical: add, sub, and, or, slt
  • Control transfer: beq, j
  • CPU performance factors
  • Instruction count (determined by ISA and compiler)
  • Cycles per instruction and cycle time (determined by CPU hardware)

2

slide-3
SLIDE 3

CSEE 3827, Spring 2009 Martha Kim

Instruction Execution

  • PC → instruction memory, fetch instruction
  • Register numbers → register file, read registers
  • Depending on instruction class:
  • Use ALU to calculate:
  • Arithmetic or logical result
  • Memory address for load/store
  • Branch target address
  • Access data for load/store
  • PC ← target address or PC + 4

3

slide-4
SLIDE 4

CSEE 3827, Spring 2009 Martha Kim

CPU Overview

4 ฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀ ฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀

฀ ฀

slide-5
SLIDE 5

CSEE 3827, Spring 2009 Martha Kim

Can’t just join wires together, use muxes

5 ฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀ ฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀

฀ ฀

slide-6
SLIDE 6

CSEE 3827, Spring 2009 Martha Kim

Control

6

฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀

฀ ฀ ฀

slide-7
SLIDE 7

MIPS Datapath

slide-8
SLIDE 8

CSEE 3827, Spring 2009 Martha Kim

Combinational Elements

  • AND gate (Y = A & B)
  • Multiplexer (Y = S ? A : B)

8

  • Adder (Y = A + B)
  • Arithmetic/Logic Unit (ALU)

A B Y A B Y A B Y F (Y = F(A,B)) A B Y S

+

ALU

slide-9
SLIDE 9

CSEE 3827, Spring 2009 Martha Kim

Clocking Methodology

9

Combinational logic transforms data during clock cycles. Longest combinational delay determines clock period.

฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀

  • ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀

฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀

slide-10
SLIDE 10

CSEE 3827, Spring 2009 Martha Kim

Building a datapath incrementally

  • Datapath: elements that process data and addresses in the CPU
  • Datapath will execute one instruction in one clock cycle
  • Each datapath element can only do one function at a time
  • Hence, we need separate instruction and data memories
  • Use multiplexers where alternate data sources are used for different

instructions

10

slide-11
SLIDE 11

CSEE 3827, Spring 2009 Martha Kim

Instruction Fetch

11

฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀

  • ฀฀
  • ฀฀

  • Fetch Instruction contained in PC register from memory
  • Compute PC + 4 for next instruction
slide-12
SLIDE 12

CSEE 3827, Spring 2009 Martha Kim

Part 1: Instruction Fetch

12

฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀

slide-13
SLIDE 13

CSEE 3827, Spring 2009 Martha Kim

R-Format Instructions

  • Read two register operands
  • Perform arithmetic/logical operation
  • Write register result

13

  • ฀฀

฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀

slide-14
SLIDE 14

CSEE 3827, Spring 2009 Martha Kim

Load/Store Instructions

14

  • Read register operands
  • Calculate address using 16-bit offset (use ALU but sign-extend offset)
  • Load: read memory and update register
  • Store: write register value to memory

฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀

  • ฀฀฀
  • ฀฀฀
slide-15
SLIDE 15

CSEE 3827, Spring 2009 Martha Kim

Part 2: R-Type/Load/Store Datapath

15 ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀

slide-16
SLIDE 16

CSEE 3827, Spring 2009 Martha Kim

Branch Instructions

  • Read register operands
  • Compare operands (use ALU: subtract and check zero output)
  • Calculate target address
  • Sign-extend displacement
  • Shift left two places (word displacement)
  • Add to PC+4 (already calculated by instruction fetch)

16

slide-17
SLIDE 17

CSEE 3827, Spring 2009 Martha Kim

Part 3: Instruction Fetch w. Branch

17

  • ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀

฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀

  • ฀฀฀
slide-18
SLIDE 18

CSEE 3827, Spring 2009 Martha Kim

Full Datapath

18

฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀

slide-19
SLIDE 19

MIPS Datapath Control

slide-20
SLIDE 20

CSEE 3827, Spring 2009 Martha Kim

Datapath Control Scheme

20

฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀฀฀

฀ ฀ ฀ ฀ ฀

  • Main control controls whole

datapath based on opcode

ALU control controls ALU based on opcode (ALUOp) and function field (funct)

slide-21
SLIDE 21

CSEE 3827, Spring 2009 Martha Kim

ALU Control Inputs/Outputs

21

R-type → 10 lw → 00 sw → 00 beq → 01 0000 → AND 0001 → OR 0010 → add 0110 → subtract 0111 → set on less than Instruction[5:0]

Main Control

ALUOp Operation 2 4

ALU

ALU control

(See Appendix C of text for implementation of corresponding ALU.)

slide-22
SLIDE 22

CSEE 3827, Spring 2009 Martha Kim

ALU Control Implementation

22

lw sw beq R-type R-type R-type R-type R-type → 00 → 00 → 01 → 10 → 10 → 10 → 10 → 10 xxxxxx → load word xxxxxx → store word xxxxxx → branch equal 100000 → add 100010 → subtract 100100 → AND 100101 → OR 101010 → set on less than → add → add → subtract → add → subtract → AND → OR → set on less than → 0010 → 0010 → 0110 → 0010 → 0110 → 0000 → 0001 → 0111

  • p

c

  • d

e A L U O p f r

  • m

m a i n c

  • n

t r

  • l

I n s t r u c t i

  • n

[ 5 : ] O p e r a t i

  • n
slide-23
SLIDE 23

CSEE 3827, Spring 2009 Martha Kim

ALU Control Truth Table

23

xxxxxx xxxxxx xxxxxx 100000 100010 100100 100101 101010 0010 0010 0110 0010 0110 0000 0001 0111

A L U O p f r

  • m

m a i n c

  • n

t r

  • l

I n s t r u c t i

  • n

[ 5 : ] O p e r a t i

  • n

00 00 01 10 10 10 10 10

slide-24
SLIDE 24

CSEE 3827, Spring 2009 Martha Kim

ALU Control Truth Table 2

24

฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀

slide-25
SLIDE 25

CSEE 3827, Spring 2009 Martha Kim

Datapath Control Scheme

25

฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀฀฀

฀ ฀ ฀ ฀ ฀

slide-26
SLIDE 26

CSEE 3827, Spring 2009 Martha Kim

Main control signals derive from instruction types

26

rs rt rd shamt funct

31:26 25:21 20:16 15:11 10:6 5:0

35 or 43 rs rt constant

15:0

4 rs rt constant

15:0

R-type: Load/Store: Branch:

31:26 25:21 20:16 31:26 25:21 20:16

always read read, except for load write for R-type and load sign-extend and add

slide-27
SLIDE 27

CSEE 3827, Spring 2009 Martha Kim

R-Type Control Signals

27

10 1 1

(Alt. illustration: Fig. 4.19)

slide-28
SLIDE 28

CSEE 3827, Spring 2009 Martha Kim

lw Control Signals

28

00 1 1 1 1

(Alt. illustration: Fig. 4.20)

slide-29
SLIDE 29

CSEE 3827, Spring 2009 Martha Kim

sw Control Signals

29

1 00 x x 1

slide-30
SLIDE 30

CSEE 3827, Spring 2009 Martha Kim

beq Control Signals

30

1 01 x x

(Alt. illustration: Fig. 4.21)

slide-31
SLIDE 31

CSEE 3827, Spring 2009 Martha Kim

Main Control Truth Table

31

฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀

000000 100011 101011 000100

Instruction[31:26]

slide-32
SLIDE 32

Implementing Jumps

slide-33
SLIDE 33

CSEE 3827, Spring 2009 Martha Kim

  • Unconditional jump to instruction at label
  • Instruction encoded in J-type format
  • Jump uses word addresses
  • Update PC with concatenation of:
  • Top 4 bits of old PC
  • 26-bit jump address
  • 00

The j instruction

33

2 address

j label

25:0 31:26

slide-34
SLIDE 34

CSEE 3827, Spring 2009 Martha Kim

Implementing the jump instruction

34

slide-35
SLIDE 35

CSEE 3827, Spring 2009 Martha Kim

Implementing the jump instruction -- in class soln

35

slide-36
SLIDE 36

CPU Performance

slide-37
SLIDE 37

CSEE 3827, Spring 2009 Martha Kim

Understanding Performance

  • Algorithm → number of operations executed
  • Programming language, compiler, architecture → determine number of

machine instructions executed per operation

  • Processor and memory system → determines how fast instructions are

executed

  • I/O system (including OS) → determines how fast I/O operations are executed

37

slide-38
SLIDE 38

CSEE 3827, Spring 2009 Martha Kim

Defining Performance

  • Which airplane has the best performance?

38

100 200 300 400 500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Passenger Capacity 2000 4000 6000 8000 10000 Douglas DC- 8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Range (miles) 500 1000 1500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Speed (mph) 100000 200000 300000 400000 Douglas DC- 8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Passengers x mph

slide-39
SLIDE 39

CSEE 3827, Spring 2009 Martha Kim

Response Time and Throughput

39

Response time: how long it takes to do a task, sometimes also called latency [time/work] Throughput: total work done per unit time [work/time]

How are response time and throughput affected by. . . Replacing the processor with a faster version? Adding more processors?

For now, we’ll focus on response time

slide-40
SLIDE 40

CSEE 3827, Spring 2009 Martha Kim

Relative Performance

40

Define: Performance = 1 / Execution Time

“X is n times faster than Y” → Performance X / Performance Y = Execution Time Y / Execution Time X = n Program takes 10 s to run on machine A, 15 s on machine B Execution Time B / Execution Time A = 15 / 10 = 1.5 “A is 1.5 times faster than B”

Example:

slide-41
SLIDE 41

CSEE 3827, Spring 2009 Martha Kim

Measuring Execution Time

41

Define: Elapsed Time

Total response time including all aspects (Processing, I/O, overhead, idle time)

Define: CPU Time

Time spent processing a given job (discounts I/O time, other jobs shares) Elapsed Time > CPU Time

slide-42
SLIDE 42

CSEE 3827, Spring 2009 Martha Kim

CPU Clocking

42

Operation of digital hardware governed by a constant-rate clock

Clock Data transfer and computation Update state

Clock period

Time

Clock period: duration of a clock cycle e.g., 250ps = 0.25ns Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz

slide-43
SLIDE 43

CSEE 3827, Spring 2009 Martha Kim

CPU Time

43

CPU Time = CPU Clock Cycles * Clock Cycle Time = CPU Clock Cycles / Clock Rate

Performance improved by:

  • 1. Reducing number of clock cycles
  • 2. Increasing clock rate (reducing clock period)

Hardware designer must often trade off clock rate against cycle count.

slide-44
SLIDE 44

CSEE 3827, Spring 2009 Martha Kim

CPU Time Example

44

Computer A: 2GHz clock, 10s CPU time Designing Computer B:

  • Aim for 6s CPU Time
  • Clock rate increase requires 1.2x the number of cycles

How fast must Computer B’s clock be?

4GHz 6s 10 24 6s 10 20 1.2 Rate Clock 10 20 2GHz 10s Rate Clock Time CPU Cycles Clock 6s Cycles Clock 1.2 Time CPU Cycles Clock Rate Clock

9 9 B 9 A A A A B B B

= × = × × = × = × = × = × = =

slide-45
SLIDE 45

CSEE 3827, Spring 2009 Martha Kim

Instruction Count and CPI

45

Clock Cycles = Instruction Count * Cycles per Instruction CPU Time = Instruction Count * CPI * Clock Cycle Time = (Instruction Count * CPI) / Clock Rate

Instruction count Determined by program, ISA, and compiler Average cycles per instruction (CPI)

  • Determined by CPU hardware
  • If different instructions have different CPI, can compute a

weighted average based on instruction mix

slide-46
SLIDE 46

CSEE 3827, Spring 2009 Martha Kim

CPI Example

46

Computer A: cycle time = 250ps, CPI=2.0 Computer B: cycle time = 500ps, CPI=1.2 Same ISA Which is faster, and by how much?

1.2 500ps I 600ps I A Time CPU B Time CPU 600ps I 500ps 1.2 I B Time Cycle B CPI Count n Instructio B Time CPU 500ps I 250ps 2.0 I A Time Cycle A CPI Count n Instructio A Time CPU = × × = × = × × = × × = × = × × = × × =

A is faster... … by this much

slide-47
SLIDE 47

CSEE 3827, Spring 2009 Martha Kim

Amdahl’s Law

47

Be aware when optimizing. . .

T =

improved

T improvement factor + T

unaffected

Example: On machine A, multiplication accounts for 80s out of 100s total CPU time. How much improvement in multiplication performance to get 5x speedup overall? Corollary: make the common case fast

affected

slide-48
SLIDE 48

CSEE 3827, Spring 2009 Martha Kim

Performance Summary

48

CPU Time = Instructions Program Clock cycles Instruction Seconds Clock cycle x x

Performance depends on all of these things. Algorithm, programming language and compiler compiler affect these terms. ISA affects all three.

slide-49
SLIDE 49

CSEE 3827, Spring 2009 Martha Kim

Single-Cycle CPU Performance Issues

  • Longest delay determines clock period
  • Critical path: load instruction
  • instruction memory → register file → ALU → data memory → register file
  • Not feasible to vary clock period for different instructions
  • We will improve performance by pipelining

49

slide-50
SLIDE 50

CSEE 3827, Spring 2009 Martha Kim

Pipelining Preview: Laundry Analogy

50

฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀