Pipelining Raul Queiroz Feitosa Parts of these slides are from the - - PowerPoint PPT Presentation

pipelining
SMART_READER_LITE
LIVE PREVIEW

Pipelining Raul Queiroz Feitosa Parts of these slides are from the - - PowerPoint PPT Presentation

Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective To present the Pipelining concept, its limitations and the techniques for performance optimization 2 Pipelining Outline


slide-1
SLIDE 1

Pipelining

Raul Queiroz Feitosa

Parts of these slides are from the support material provided by W. Stallings

slide-2
SLIDE 2 Pipelining 2

Objective

To present the Pipelining concept, its limitations and the techniques for performance optimization

slide-3
SLIDE 3 Pipelining 3

Outline

 Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards

Resource Data Control

slide-4
SLIDE 4 Pipelining 4

Instruction Cycle State Diagram

instruction address decoding interrrupt check instruction fetch instruction

  • peration

decoding

  • perand

address calculation data

  • peration

result store interrrupt

  • perand

fetch result address calculation indirection multiple

  • perands

multiple results indirection no interrupt interrupt

slide-5
SLIDE 5 Pipelining 5

Outline

 Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards

Resource Data Control

slide-6
SLIDE 6 Pipelining 6

Instruction Pipelining

 Instruction cycle is split into sequential steps  A specific hardware unit (pipeline stage) is built to

perform each step

 Pipeline stages are arranged as a chain

pipeline stages 1 2 k

slide-7
SLIDE 7 Pipelining 7

Instruction Pipelining - Example

FI – fetch instruction DI – decode instruction CO – calculate operand FO – fetch operand EI – execute instruction WO – write operand (result) FI DI CO FO EI WO

slide-8
SLIDE 8 Pipelining 8

Instruction Pipeline Operation

FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO instruction 2 instruction 1 instruction 3 instruction 4 instruction 5 instruction 6 instruction 7 instruction 15 instruction 16 time 1 2 3 4 5 6 7 8 9 10 11 12 13 14

slide-9
SLIDE 9 Pipelining 9

Outline

 Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards

Resource Data Control

slide-10
SLIDE 10 Pipelining 10

Pipeline Performance

Assuming

 k stages  τ= τ1 = τ2 = ... = τk (τi is the time delay of the i-th stage)

Tn,k time for a pipeline with k stages to execute n instructions

 Tn,1= n k τ

→ (conventional machine)

 Tn,k = k τ+ (n-1)τ = (n+k-1)τ → (pipeline)

The speedup For large n 1 ) 1 (       k n nk k n nk Sk  

k k n nk S

n k n

   

   

1 lim lim

! ! ! ! ! !

slide-11
SLIDE 11 Pipelining 11

Pipeline Performance

Number of instructions 1 2 4 8 16 32 64 128 Number of instructions 12 10 8 6 4 2 Speedup k = 9 stages k = 6 stages k = 12 stages Number of instructions 0 5 10 15 20 Number of stages Speedup n = 20 instructions n = 10 instructions n = 30 instructions 14 12 10 8 6 4 2

slide-12
SLIDE 12 Pipelining 12

Pipeline Performance

The optimal performance is never reached because:

 The execution time is different from stage to stage  There is still a time delay to latch the output of each stage  Pipeline hazards

k

      

2 1

  d

i i

    max

slide-13
SLIDE 13 Pipelining 13

Outline

 Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards

Resource Data Control

slide-14
SLIDE 14 Pipelining 14

Pipeline Hazards

 In some cases a portion of pipeline must

stall, due to the so called hazards

 Also called pipeline bubble  Types of hazards

Resource Data Control

slide-15
SLIDE 15 Pipelining 15

Outline

 Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards

Resource Data Control

slide-16
SLIDE 16 Pipelining 16

Resource Hazards

Also called structural hazards, occur when multiple instructions need the same resource, e.g., single port memory

instruction 2 instruction 1 instruction 3 instruction 4 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 WO instruction 2 instruction 1 instruction 3 instruction 4 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI FI CO FO DI EI WO time 1 2 3 4 5 6 7 8 9 10 11 12 13 14

idle idle

slide-17
SLIDE 17 Pipelining 17

Outline

 Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards

Resource Data Control

slide-18
SLIDE 18 Pipelining 18

 Conflict in access of an operand location  Two instructions to be executed in sequence  Both access a particular memory or register

  • perand

 Example:

Data Hazards

SUB ECX,EAX ADD EAX,EBX instruction 3 instruction 4 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI FO EI WO CO DI time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 idle idle idle

slide-19
SLIDE 19 Pipelining 19

Outline

 Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards

Resource Data Control

slide-20
SLIDE 20 Pipelining 20

Control Hazards

JNZ ADDRESS ADD EAX,EBX instruction 3 instruction 4 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 JNZ ADDRESS ADD EAX,EBX instruction 3 instruction 4 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO time 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Also called branch hazard. What is the address of the instruction following a conditional branch from? Known only here no memory conflict!

idle

slide-21
SLIDE 21 Pipelining 21

Dealing with Branches

Multiple Streams

 Two pipelines  prefetch each branch into a separate

pipeline (IBM 370/168 and IBM 3033). Always one pipeline produces no useful work.

Prefetch Branch Target

 Target of branch is prefetched in addition to instructions

following branch; keep target until branch is executed (IBM 360/91)

Loop buffer

 Very fast memory maintained by fetch stage of pipeline.

Check buffer before fetching from memory (CRAY-1)

Branch prediction Delayed branching

Control Hazards

slide-22
SLIDE 22 Pipelining 22

Concept:

Instead of delaying the fetch of next instruction, it is

predicted

Results are stored in temporary registers If prediction correct, make results definitive If prediction incorrect, flush results, and restart

fetching from the right address

Branch Prediction

slide-23
SLIDE 23 Pipelining 23

Static Methods:

Predict “never taken” or “always taken” Predicted by opcode

 There are two codes for each branch instruction → 1 bit

indicates “predict taken” or “predict not taken”

 Compiler analyses the code, guesses and generates the

appropriate branch code.

 Processor follows compiler suggestion  Implies in code incompatibility with previous processors

Branch Prediction

slide-24
SLIDE 24 Pipelining 24

Branch Prediction

Dynamic Methods

Based on recent branch history

  • ● ●
  • ● ●
  • ● ●

branch instruction address target address state Branch History Table not taken predict taken not taken predict taken not taken taken taken predict not taken predict not taken taken not taken taken Branch Prediction State Diagram

slide-25
SLIDE 25 Pipelining 25

Delayed Branch

Concept

The branch takes effect only after the execution of the following instruction reduces the branch penalty

MOV EDX,ECX ADD EAX,[EBX] JZ LA instruction LA: instruction

...

MOV EDX,ECX ADD EAX,[EBX] JZ LA instruction LA: instruction conventional branch

... ...

not taken taken ADD EAX,[EBX] JZ LA MOV EDX,ECX instrução LA: instrução

... ...

ADD EAX,[EBX] JZ LA MOV EDX,ECX instruction LA: instruction

...

always executed

delayed branch not taken taken

slide-26
SLIDE 26 Pipelining 26

FO DI FI

Delayed Branch

Example: conventional branch Prediction wrong

ADD EAX,[EBX] MOV EDX,ECX JZ LA

instruction 1 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO time FI DI FI FI CO FO DI EI WO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 instruction 2 instruction 3 instruction 4

branch penalty

slide-27
SLIDE 27 Pipelining 27

FI FI

Delayed Branch

Example: delayed branch Prediction wrong

FI DI

ADD EAX,[EBX] MOV EDX,ECX JZ LA

instruction 1 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI FI DI EI WO time FI CO FO DI EI WO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 instruction 2 instruction 3 instruction 4

branch penalty

slide-28
SLIDE 28 Pipelining 28

Exercises

PROGRAM 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ADD EAX,[EBX+ESI] INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768]

Exercise 1: Assume the pipeline shown in slide 7 containing 6 stages. Complete the graphs below that represent the pipeline operation assuming a single port memory. Hint: take in consideration the memory accesses for instruction fetch, operand fetch and result write.

PROGRAM 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ADD [EBX+ESI], EAX

INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768]

slide-29
SLIDE 29 Pipelining 29

Exercises

PROGRAM 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ADD EAX,[EBX+ESI] INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768]

Exercise 2: Repeat the previous exercise assuming that there are separate Instruction and Data caches, so that accesses to instruction and operands may

  • ccur simultaneously.

PROGRAM 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ADD [EBX+ESI], EAX

INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768]

slide-30
SLIDE 30 Pipelining 30

Pipelining

END