CS425 Computer System Design Lecture 10 Pipelining Hazards - - PowerPoint PPT Presentation

cs425 computer system design lecture 10 pipelining hazards
SMART_READER_LITE
LIVE PREVIEW

CS425 Computer System Design Lecture 10 Pipelining Hazards - - PowerPoint PPT Presentation

CS425 Computer System Design Lecture 10 Pipelining Hazards Shankar Balachandran Dept. of Computer Science and Engineering IIT-Madras shankar@cse.iitm.ernet.in 8/28/2006 1 2 Recap 8/28/2006 3 Hennessey and Patterson Reference


slide-1
SLIDE 1

8/28/2006 1

CS425 – Computer System Design Lecture 10 – Pipelining Hazards

Shankar Balachandran

  • Dept. of Computer Science and Engineering

IIT-Madras shankar@cse.iitm.ernet.in

slide-2
SLIDE 2

2 8/28/2006

Recap

slide-3
SLIDE 3

3 8/28/2006

Reference

  • Hennessey and Patterson
slide-4
SLIDE 4

4 8/28/2006

Its Not That Easy for Computers

  • Limits to pipelining: Hazards prevent next

instruction from executing during its designated clock cycle

– Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) – Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock) – Control hazards: Pipelining of branches & other instructions that change the PC – Common solution is to stall the pipeline until the hazard is resolved, inserting one or more “bubbles” in the pipeline

slide-5
SLIDE 5

5 8/28/2006

Structural Hazard – One Memory Port

I n s t r. O r d e r

Load Inst 1 Inst 2 Inst 3

ALU IF Reg Dm Reg ALU

IF Reg Dm Reg

ALU IF Reg Dm Reg Dm Reg Reg ALU IF

Structural Hazard

slide-6
SLIDE 6

6 8/28/2006

Resolving Structural Hazards

  • Defn: attempt to use same hardware for two

different things at the same time

  • Solution 1: Wait

⇒must detect the hazard ⇒must have mechanism to stall

  • Solution 2: Throw more hardware at the

problem

slide-7
SLIDE 7

7 8/28/2006

Detection and Resolution

I n s t r. O r d e r

Load Inst 1 Inst 2 Inst 4

ALU IF Reg Dm Reg ALU

IF Reg Dm Reg

ALU IF Reg Dm Reg ALU Dm Reg Reg IF

Stall

Bubble Bubble BubbleBubble Bubble

slide-8
SLIDE 8

8 8/28/2006

Instruction Set and Structural Hazard

  • Simple to determine the sequence of

resources used by an instruction

– opcode tells it all

  • Uniformity in the resource usage
  • Compare MIPS to IA32?
  • MIPS approach => all instructions flow

through same 5-stage pipelining

slide-9
SLIDE 9

9 8/28/2006

Data Hazards

sub r4, r1, r3

ALU IF Reg Dm Reg ALU

IF Reg Dm Reg

ALU IF Reg Dm Reg ALU Dm Reg Reg

IF

ALU Dm Reg Reg

IF

add r1, r2, r3 and r6, r1, r7

  • r r8, r1, r9

xor r10, r1, r11

slide-10
SLIDE 10

10 8/28/2006

Three Generic Data Hazards

  • Read After Write (RAW)

InstrJ tries to read operand before InstrI writes it

  • Caused by a “Data Dependence” (in compiler

nomenclature). This hazard results from an actual need for communication.

I: add r1,r2,r3 J: sub r4,r1,r3

slide-11
SLIDE 11

11 8/28/2006

Three Generic Data Hazards

  • Write After Read (WAR)

InstrJ writes operand before InstrI reads it

  • Called an “anti-dependence” by compiler writers.

This results from reuse of the name “r1”.

  • Can’t happen in MIPS 5 stage pipeline because:

– All instructions take 5 stages, and – Reads are always in stage 2, and – Writes are always in stage 5

I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7

slide-12
SLIDE 12

12 8/28/2006

Three Generic Data Hazards

  • Write After Write (WAW)

InstrJ writes operand before InstrI writes it.

  • Called an “output dependence” by compiler writers

This also results from the reuse of name “r1”.

  • Can’t happen in MIPS 5 stage pipeline because:

– All instructions take 5 stages, and – Writes are always in stage 5

I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7

slide-13
SLIDE 13

13 8/28/2006

Forwarding to Avoid Data Hazard

sub r4, r1, r3

ALU IF Reg Dm Reg ALU

IF Reg Dm Reg

ALU IF Reg Dm Reg ALU Dm Reg Reg

IF

ALU Dm Reg Reg

IF

add r1, r2, r3 and r6, r1, r7

  • r r8, r1, r9

xor r10, r1, r11

slide-14
SLIDE 14

14 8/28/2006

HW Change for Forwarding

MEM/WR ID/EX EX/MEM Data Memory

ALU

mux mux Registers

NextPC Immediate

mux

slide-15
SLIDE 15

15 8/28/2006

Data Hazard Even With Forwarding

sub r4, r1, r6

ALU IF Reg Dm Reg ALU

IF Reg Dm Reg

ALU IF Reg Dm Reg ALU Dm Reg Reg

IF

lw r1, 0(r2) and r6, r1, r7

  • r r8, r1, r9
slide-16
SLIDE 16

16 8/28/2006

Resolving this Load Hazard

ALU

IF Reg Dm Reg IF Reg

ALU

Dm Reg Bubble IF Reg

ALU

Dm Reg Bubble IF Reg

ALU

Dm Reg Bubble

  • r r8,r1,r9

lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7

slide-17
SLIDE 17

17 8/28/2006

Software Scheduling

Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d ,e, and f in memory.

Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,Rd

Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SW d,Rd

slide-18
SLIDE 18

18 8/28/2006

Instruction Set Connection

  • What is exposed about this organizational

hazard in the instruction set?

  • k cycle delay?

– bad, CPI is not part of ISA

  • k instruction slot delay

– load should not be followed by use of the value in the next k instructions

  • Nothing, but code can reduce run-time delays
  • MIPS did the transformation in the assembler
slide-19
SLIDE 19

19 8/28/2006

Historical Perspective: Microprogramming

Main Memory execution unit

control memory

CPU ADD SUB AND DATA . . . User program plus Data this can change!

  • ne of these is

mapped into one

  • f these

Supported complex instructions a sequence of simple micro-inst (RTs) Pipelined micro-instruction processing, but very limited view. Could not reorganize macroinstructions to enable pipelining

slide-20
SLIDE 20

20 8/28/2006

Control Hazard on Branches => Three Stage Stall

ALU IF Reg Dm Reg ALU

IF Reg Dm Reg

ALU IF Reg Dm Reg ALU Dm Reg Reg

IF

ALU Dm Reg Reg

IF 10: beq r1,r3,36 14: and r2,r3,r5 18: or r6,r1,r7 22: add r8,r1,r9 36: xor r10,r1,r11

slide-21
SLIDE 21

21 8/28/2006

Example : Branch Stall Impact

  • If 30% branch, Stall 3 cycles significant
  • Two part solution:

– Determine branch taken or not sooner, AND – Compute taken branch address earlier

  • MIPS branch tests if register = 0 or ≠ 0
  • MIPS Solution:

– Move Zero test to ID/RF stage – Adder to calculate new PC in ID/RF stage – 1 clock cycle penalty for branch versus 3