[PPT] - CSEE 3827: Fundamentals of Computer Systems Lecture 18, 19, & 20 PowerPoint Presentation

SLIDE 1

CSEE 3827: Fundamentals of Computer Systems

Lecture 18, 19, & 20 April 2009 Martha Kim martha@cs.columbia.edu

SLIDE 2

CSEE 3827, Spring 2009 Martha Kim

Outline

We will examine two MIPS implementations
A single-cycle version
A pipelined version
Simple subset of MIPS, showing most aspects
Memory reference: lw, sw
Arithmetic/logical: add, sub, and, or, slt
Control transfer: beq, j
CPU performance factors
Instruction count (determined by ISA and compiler)
Cycles per instruction and cycle time (determined by CPU hardware)

2

SLIDE 3

CSEE 3827, Spring 2009 Martha Kim

Instruction Execution

PC → instruction memory, fetch instruction
Register numbers → register file, read registers
Depending on instruction class:
Use ALU to calculate:
Arithmetic or logical result
Memory address for load/store
Branch target address
Access data for load/store
PC ← target address or PC + 4

3

SLIDE 4

CSEE 3827, Spring 2009 Martha Kim

CPU Overview

4 ฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀ ฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀

฀

฀ ฀

SLIDE 5

CSEE 3827, Spring 2009 Martha Kim

Can’t just join wires together, use muxes

5 ฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀ ฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀

฀

฀ ฀

SLIDE 6

CSEE 3827, Spring 2009 Martha Kim

Control

6

฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀

฀

฀ ฀ ฀

SLIDE 7

MIPS Datapath

SLIDE 8

CSEE 3827, Spring 2009 Martha Kim

Combinational Elements

AND gate (Y = A & B)
Multiplexer (Y = S ? A : B)

8

Adder (Y = A + B)
Arithmetic/Logic Unit (ALU)

A B Y A B Y A B Y F (Y = F(A,B)) A B Y S

+

ALU

SLIDE 9

CSEE 3827, Spring 2009 Martha Kim

Clocking Methodology

9

Combinational logic transforms data during clock cycles. Longest combinational delay determines clock period.

฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀

฀

฀

฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀

฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀

฀

SLIDE 10

CSEE 3827, Spring 2009 Martha Kim

Building a datapath incrementally

Datapath: elements that process data and addresses in the CPU
Datapath will execute one instruction in one clock cycle
Each datapath element can only do one function at a time
Hence, we need separate instruction and data memories
Use multiplexers where alternate data sources are used for different

instructions

10

SLIDE 11

CSEE 3827, Spring 2009 Martha Kim

Instruction Fetch

11

฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀

฀฀
฀฀

฀

Fetch Instruction contained in PC register from memory
Compute PC + 4 for next instruction

SLIDE 12

CSEE 3827, Spring 2009 Martha Kim

Part 1: Instruction Fetch

12

฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀

SLIDE 13

CSEE 3827, Spring 2009 Martha Kim

R-Format Instructions

Read two register operands
Perform arithmetic/logical operation
Write register result

13

฀฀

฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀

฀
฀

฀

฀
฀

฀

฀

SLIDE 14

CSEE 3827, Spring 2009 Martha Kim

Load/Store Instructions

14

Read register operands
Calculate address using 16-bit offset (use ALU but sign-extend offset)
Load: read memory and update register
Store: write register value to memory
฀

฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀

฀฀฀
฀฀฀

SLIDE 15

CSEE 3827, Spring 2009 Martha Kim

Part 2: R-Type/Load/Store Datapath

15 ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀

฀
฀
฀
฀

฀

SLIDE 16

CSEE 3827, Spring 2009 Martha Kim

Branch Instructions

Read register operands
Compare operands (use ALU: subtract and check zero output)
Calculate target address
Sign-extend displacement
Shift left two places (word displacement)
Add to PC+4 (already calculated by instruction fetch)

16

SLIDE 17

CSEE 3827, Spring 2009 Martha Kim

Part 3: Instruction Fetch w. Branch

17

฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀

฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀

฀
฀
฀

฀

฀

฀

฀฀฀
฀
฀

SLIDE 18

CSEE 3827, Spring 2009 Martha Kim

Full Datapath

18

฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀

฀
฀
฀

฀

฀
฀

SLIDE 19

MIPS Datapath Control

SLIDE 20

CSEE 3827, Spring 2009 Martha Kim

Datapath Control Scheme

20

฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀฀฀

฀
฀
฀
฀
฀

฀ ฀ ฀ ฀ ฀

฀
Main control controls whole

datapath based on opcode

ALU control controls ALU based on opcode (ALUOp) and function field (funct)

SLIDE 21

CSEE 3827, Spring 2009 Martha Kim

ALU Control Inputs/Outputs

21

R-type → 10 lw → 00 sw → 00 beq → 01 0000 → AND 0001 → OR 0010 → add 0110 → subtract 0111 → set on less than Instruction[5:0]

Main Control

ALUOp Operation 2 4

ALU

ALU control

(See Appendix C of text for implementation of corresponding ALU.)

SLIDE 22

CSEE 3827, Spring 2009 Martha Kim

ALU Control Implementation

22

lw sw beq R-type R-type R-type R-type R-type → 00 → 00 → 01 → 10 → 10 → 10 → 10 → 10 xxxxxx → load word xxxxxx → store word xxxxxx → branch equal 100000 → add 100010 → subtract 100100 → AND 100101 → OR 101010 → set on less than → add → add → subtract → add → subtract → AND → OR → set on less than → 0010 → 0010 → 0110 → 0010 → 0110 → 0000 → 0001 → 0111

p

c

d

e A L U O p f r

m

m a i n c

n

t r

l

I n s t r u c t i

n

[ 5 : ] O p e r a t i

n

SLIDE 23

CSEE 3827, Spring 2009 Martha Kim

ALU Control Truth Table

23

xxxxxx xxxxxx xxxxxx 100000 100010 100100 100101 101010 0010 0010 0110 0010 0110 0000 0001 0111

A L U O p f r

m

m a i n c

n

t r

l

I n s t r u c t i

n

[ 5 : ] O p e r a t i

n

00 00 01 10 10 10 10 10

SLIDE 24

CSEE 3827, Spring 2009 Martha Kim

ALU Control Truth Table 2

24

฀

฀

฀
฀
฀
฀
฀

฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀

SLIDE 25

CSEE 3827, Spring 2009 Martha Kim

Datapath Control Scheme

25

฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀ ฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀฀฀

฀
฀
฀
฀
฀

฀ ฀ ฀ ฀ ฀

฀

SLIDE 26

CSEE 3827, Spring 2009 Martha Kim

Main control signals derive from instruction types

26

rs rt rd shamt funct

31:26 25:21 20:16 15:11 10:6 5:0

35 or 43 rs rt constant

15:0

4 rs rt constant

15:0

R-type: Load/Store: Branch:

31:26 25:21 20:16 31:26 25:21 20:16

always read read, except for load write for R-type and load sign-extend and add

SLIDE 27

CSEE 3827, Spring 2009 Martha Kim

R-Type Control Signals

27

10 1 1

(Alt. illustration: Fig. 4.19)

SLIDE 28

CSEE 3827, Spring 2009 Martha Kim

lw Control Signals

28

00 1 1 1 1

(Alt. illustration: Fig. 4.20)

SLIDE 29

CSEE 3827, Spring 2009 Martha Kim

sw Control Signals

29

1 00 x x 1

SLIDE 30

CSEE 3827, Spring 2009 Martha Kim

beq Control Signals

30

1 01 x x

(Alt. illustration: Fig. 4.21)

SLIDE 31

CSEE 3827, Spring 2009 Martha Kim

Main Control Truth Table

31

฀
฀

฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀ ฀฀฀฀฀฀฀฀ ฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀

000000 100011 101011 000100

Instruction[31:26]

SLIDE 32

Implementing Jumps

SLIDE 33

CSEE 3827, Spring 2009 Martha Kim

Unconditional jump to instruction at label
Instruction encoded in J-type format
Jump uses word addresses
Update PC with concatenation of:
Top 4 bits of old PC
26-bit jump address
00

The j instruction

33

2 address

j label

25:0 31:26

SLIDE 34

CSEE 3827, Spring 2009 Martha Kim

Implementing the jump instruction

34

SLIDE 35

CSEE 3827, Spring 2009 Martha Kim

Implementing the jump instruction -- in class soln

35

SLIDE 36

CPU Performance

SLIDE 37

CSEE 3827, Spring 2009 Martha Kim

Understanding Performance

Algorithm → number of operations executed
Programming language, compiler, architecture → determine number of

machine instructions executed per operation

Processor and memory system → determines how fast instructions are

executed

I/O system (including OS) → determines how fast I/O operations are executed

37

SLIDE 38

CSEE 3827, Spring 2009 Martha Kim

Defining Performance

Which airplane has the best performance?

38

100 200 300 400 500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Passenger Capacity 2000 4000 6000 8000 10000 Douglas DC- 8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Range (miles) 500 1000 1500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Speed (mph) 100000 200000 300000 400000 Douglas DC- 8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Passengers x mph

SLIDE 39

CSEE 3827, Spring 2009 Martha Kim

Response Time and Throughput

39

Response time: how long it takes to do a task, sometimes also called latency [time/work] Throughput: total work done per unit time [work/time]

How are response time and throughput affected by. . . Replacing the processor with a faster version? Adding more processors?

For now, we’ll focus on response time

SLIDE 40

CSEE 3827, Spring 2009 Martha Kim

Relative Performance

40

Define: Performance = 1 / Execution Time

“X is n times faster than Y” → Performance X / Performance Y = Execution Time Y / Execution Time X = n Program takes 10 s to run on machine A, 15 s on machine B Execution Time B / Execution Time A = 15 / 10 = 1.5 “A is 1.5 times faster than B”

Example:

SLIDE 41

CSEE 3827, Spring 2009 Martha Kim

Measuring Execution Time

41

Define: Elapsed Time

Total response time including all aspects (Processing, I/O, overhead, idle time)

Define: CPU Time

Time spent processing a given job (discounts I/O time, other jobs shares) Elapsed Time > CPU Time

SLIDE 42

CSEE 3827, Spring 2009 Martha Kim

CPU Clocking

42

Operation of digital hardware governed by a constant-rate clock

Clock Data transfer and computation Update state

Clock period

Time

Clock period: duration of a clock cycle e.g., 250ps = 0.25ns Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz

SLIDE 43

CSEE 3827, Spring 2009 Martha Kim

CPU Time

43

CPU Time = CPU Clock Cycles * Clock Cycle Time = CPU Clock Cycles / Clock Rate

Performance improved by:

1. Reducing number of clock cycles
2. Increasing clock rate (reducing clock period)

Hardware designer must often trade off clock rate against cycle count.

SLIDE 44

CSEE 3827, Spring 2009 Martha Kim

CPU Time Example

44

Computer A: 2GHz clock, 10s CPU time Designing Computer B:

Aim for 6s CPU Time
Clock rate increase requires 1.2x the number of cycles

How fast must Computer B’s clock be?

4GHz 6s 10 24 6s 10 20 1.2 Rate Clock 10 20 2GHz 10s Rate Clock Time CPU Cycles Clock 6s Cycles Clock 1.2 Time CPU Cycles Clock Rate Clock

9 9 B 9 A A A A B B B

= × = × × = × = × = × = × = =

SLIDE 45

CSEE 3827, Spring 2009 Martha Kim

Instruction Count and CPI

45

Clock Cycles = Instruction Count * Cycles per Instruction CPU Time = Instruction Count * CPI * Clock Cycle Time = (Instruction Count * CPI) / Clock Rate

Instruction count Determined by program, ISA, and compiler Average cycles per instruction (CPI)

Determined by CPU hardware
If different instructions have different CPI, can compute a

weighted average based on instruction mix

SLIDE 46

CSEE 3827, Spring 2009 Martha Kim

CPI Example

46

Computer A: cycle time = 250ps, CPI=2.0 Computer B: cycle time = 500ps, CPI=1.2 Same ISA Which is faster, and by how much?

1.2 500ps I 600ps I A Time CPU B Time CPU 600ps I 500ps 1.2 I B Time Cycle B CPI Count n Instructio B Time CPU 500ps I 250ps 2.0 I A Time Cycle A CPI Count n Instructio A Time CPU = × × = × = × × = × × = × = × × = × × =

A is faster... … by this much

SLIDE 47

CSEE 3827, Spring 2009 Martha Kim

Amdahl’s Law

47

Be aware when optimizing. . .

T =

improved

T improvement factor + T

unaffected

Example: On machine A, multiplication accounts for 80s out of 100s total CPU time. How much improvement in multiplication performance to get 5x speedup overall? Corollary: make the common case fast

affected

SLIDE 48

CSEE 3827, Spring 2009 Martha Kim

Performance Summary

48

CPU Time = Instructions Program Clock cycles Instruction Seconds Clock cycle x x

Performance depends on all of these things. Algorithm, programming language and compiler compiler affect these terms. ISA affects all three.

SLIDE 49

CSEE 3827, Spring 2009 Martha Kim

Single-Cycle CPU Performance Issues

Longest delay determines clock period
Critical path: load instruction
instruction memory → register file → ALU → data memory → register file
Not feasible to vary clock period for different instructions
We will improve performance by pipelining

49

SLIDE 50

CSEE 3827, Spring 2009 Martha Kim

Pipelining Preview: Laundry Analogy

50

฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀฀ ฀

฀
฀
฀
฀