Lecture 10: Processor design pipelining Overlapping the execution - - PowerPoint PPT Presentation

lecture 10 processor design pipelining
SMART_READER_LITE
LIVE PREVIEW

Lecture 10: Processor design pipelining Overlapping the execution - - PowerPoint PPT Presentation

Lecture 10: Processor design pipelining Overlapping the execution of instructions Pipeline hazards Different types How to remove them Inf2C Computer Systems - 2011-2012 1 Pipelining Classic case: make all instructions


slide-1
SLIDE 1

Lecture 10: Processor design – pipelining

Overlapping the execution of instructions Pipeline hazards

– Different types – How to remove them

Inf2C Computer Systems - 2011-2012 1

slide-2
SLIDE 2

Pipelining

Classic case: make all instructions take 5 steps. e.g.:

l w r 1, n( r 2) # r 1=m em

  • r y[ n+r 2]

Step 1 2 3 4 Datapath operation Fetch instruction; PC+4 → PC Get value from r2 ALU n+r2 Get data from memory Write memory data into r1 Name IF REG ALU MEM WB IF = instruction fetch (includes PC increment) REG = fetching values from general purpose registers ALU = arithmetic/logic operations MEM = memory access WB = write back results to general purpose registers

Inf2C Computer Systems - 2011-2012 2

slide-3
SLIDE 3

Pipelining

Start one instruction per clock cycle

instruction flow MEM REG ALU WB IF MEM REG ALU WB IF MEM REG ALU WB IF MEM REG ALU WB IF MEM REG ALU WB IF cycle 1 2 3 4 5 6 7 8 9

  • Five instructions are being executed (in different stages)

during the same cycle

  • Each instruction still takes 5 cycles, but instructions

now complete every cycle: CPI → 1

Inf2C Computer Systems - 2011-2012 3

slide-4
SLIDE 4

Preparing instructions for pipelining

Stretch the execution to the max number of cycles, e.g.

sw r 1, n( r 2) # m em

  • r y[ n+r 2] =r 1

IF REG ALU MEM WB Fetch instruction; PC+4 → PC Get values of r1 and r2 from registers ALU n+r2 Store value of r1 to memory Do nothing add r 1, r 2, r 3 # r 1=r 2+r 3 IF REG ALU MEM WB Fetch instruction; PC+4 → PC Get values of r2 and r3 from registers ALU r2+r3 Do nothing Write result to r1

Inf2C Computer Systems - 2011-2012 4

slide-5
SLIDE 5

Execution speedup

MEM WB ALU REG IF MEM WB ALU REG IF MEM WB ALU REG IF MEM WB ALU REG IF MEM WB ALU REG IF MEM WB ALU REG IF MEM WB ALU REG IF MEM WB ALU REG IF

cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Speed-up roughly equal to the number of stages

Inf2C Computer Systems - 2011-2012 5

slide-6
SLIDE 6

Pipeline hazards

Complications in pipelining, called hazards

– Structural – Data – Control

Speedup achieved is limited, CPI over 1

Inf2C Computer Systems - 2011-2012 6

slide-7
SLIDE 7

Structural hazards

Example: instructions in IF and MEM stages may conflict for access to memory (cache)

= “bubble” IF REG ALU MEM WB

l w

IF REG ALU MEM WB

I 1

IF REG ALU MEM WB

I 2

IF REG ALU MEM WB

I 3

Inf2C Computer Systems - 2011-2012 7

slide-8
SLIDE 8

Structural hazards

Not enough hardware resources to execute a combination of instructions in the same clock cycle Straightforward solution: use more resources

– E.g. split cache into instruction cache (used in IF) and data cache (used in MEM)

Good design – provide enough resources to avoid hazards for common/frequent cases

Inf2C Computer Systems - 2011-2012 8

slide-9
SLIDE 9

Data hazards

One instruction must use value produced by a previous instruction Example: add r 2, r 1, r 5

add r 2, r 1, r 5 l w l w r 3 r 3, 4( r 1) , 4( r 1) addi addi r 4, r 4, r 3 r 3, n , n

IF REG ALU MEM WB

add

IF REG MEM ALU WB

l w

Inf2C Computer Systems - 2011-2012 9

MEM REG ALU WB IF MEM REG ALU WB IF

3 cycle stall

addi

slide-10
SLIDE 10

Data hazards

Processor must detect hazards and insert bubbles Solution: compiler can separate dependent instructions

l w l w r 3 r 3, 4( r 1) , 4( r 1) add r 2, r 1, r 5 add r 2, r 1, r 5 addi addi r 4, r 4, r 3 r 3, n , n

IF REG MEM ALU WB

l w

Inf2C Computer Systems - 2011-2012 10

MEM REG ALU WB IF MEM REG ALU WB IF MEM REG ALU WB IF

2 cycle stall

add addi

slide-11
SLIDE 11

Data forwarding

The data is actually available before the end of WB Why not forward it directly to the unit/stage where they are needed?

IF REG ALU MEM WB

add

IF REG ALU MEM WB

l w

Inf2C Computer Systems - 2011-2012 11

MEM REG ALU IF

addi

WB IF REG ALU MEM WB

1 cycle stall

slide-12
SLIDE 12

Control hazards

Before a conditional branch instruction is resolved, the processor does not know where to fetch the next instruction from Example: beq r 1, r 2, n

Fetch instruction; PC+4 → PC Get values of r1 and r2 from registers ALU r1-r2 and PC+n If r1-r2==0 update PC Do nothing IF REG ALU MEM WB

Branch is identified in IF but only resolved in MEM

Inf2C Computer Systems - 2011-2012 12

slide-13
SLIDE 13

Control hazards

IF MEM REG ALU WB

beq

MEM REG ALU WB IF

Branch latency

IF REG ALU MEM WB

Inf2C Computer Systems - 2011-2012 13

slide-14
SLIDE 14

Branch prediction

Solution: predict outcome of branch

– If prediction correct, bubble is reduced or eliminated – If prediction incorrect, processor must discard (“flush” or “squash”) incorrectly loaded instructions

IF REG ALU MEM WB

beq

MEM REG ALU WB IF MEM REG ALU WB IF MEM REG ALU WB IF MEM REG ALU WB IF

Flushed, on misprediction

Inf2C Computer Systems - 2011-2012 14

slide-15
SLIDE 15

Is this the end? in performance improvement

Superscalar processors:

– Can fetch more than 1 instruction per cycle – Have multiple pipelines and ALUs to execute multiple instructions simultaneously

Predicated execution:

– Execute simultaneously instructions from both targets of the branch and discard the incorrect one (e.g. IA-64) (against control hazards)

Value prediction:

– Predict result of instructions (against data hazards)

Multiprocessors

Inf2C Computer Systems - 2011-2012 15