Pipelining
Raul Queiroz Feitosa
Parts of these slides are from the support material provided by W. Stallings
Pipelining Raul Queiroz Feitosa Parts of these slides are from the - - PowerPoint PPT Presentation
Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective To present the Pipelining concept, its limitations and the techniques for performance optimization 2 Pipelining Outline
Pipelining
Raul Queiroz Feitosa
Parts of these slides are from the support material provided by W. Stallings
Objective
To present the Pipelining concept, its limitations and the techniques for performance optimization
Outline
Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards
Resource Data Control
Instruction Cycle State Diagram
instruction address decoding interrrupt check instruction fetch instruction
decoding
address calculation data
result store interrrupt
fetch result address calculation indirection multiple
multiple results indirection no interrupt interrupt
Outline
Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards
Resource Data Control
Instruction Pipelining
Instruction cycle is split into sequential steps A specific hardware unit (pipeline stage) is built to
perform each step
Pipeline stages are arranged as a chain
pipeline stages 1 2 k
Instruction Pipelining - Example
FI – fetch instruction DI – decode instruction CO – calculate operand FO – fetch operand EI – execute instruction WO – write operand (result) FI DI CO FO EI WO
Instruction Pipeline Operation
FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO instruction 2 instruction 1 instruction 3 instruction 4 instruction 5 instruction 6 instruction 7 instruction 15 instruction 16 time 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Outline
Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards
Resource Data Control
Pipeline Performance
Assuming
k stages τ= τ1 = τ2 = ... = τk (τi is the time delay of the i-th stage)
Tn,k time for a pipeline with k stages to execute n instructions
Tn,1= n k τ
→ (conventional machine)
Tn,k = k τ+ (n-1)τ = (n+k-1)τ → (pipeline)
The speedup For large n 1 ) 1 ( k n nk k n nk Sk
k k n nk S
n k n
1 lim lim
! ! ! ! ! !
Pipeline Performance
Number of instructions 1 2 4 8 16 32 64 128 Number of instructions 12 10 8 6 4 2 Speedup k = 9 stages k = 6 stages k = 12 stages Number of instructions 0 5 10 15 20 Number of stages Speedup n = 20 instructions n = 10 instructions n = 30 instructions 14 12 10 8 6 4 2
Pipeline Performance
The optimal performance is never reached because:
The execution time is different from stage to stage There is still a time delay to latch the output of each stage Pipeline hazards
k
2 1
d
i i
max
Outline
Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards
Resource Data Control
Pipeline Hazards
In some cases a portion of pipeline must
stall, due to the so called hazards
Also called pipeline bubble Types of hazards
Resource Data Control
Outline
Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards
Resource Data Control
Resource Hazards
Also called structural hazards, occur when multiple instructions need the same resource, e.g., single port memory
instruction 2 instruction 1 instruction 3 instruction 4 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 WO instruction 2 instruction 1 instruction 3 instruction 4 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI FI CO FO DI EI WO time 1 2 3 4 5 6 7 8 9 10 11 12 13 14
idle idle
Outline
Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards
Resource Data Control
Conflict in access of an operand location Two instructions to be executed in sequence Both access a particular memory or register
Example:
Data Hazards
SUB ECX,EAX ADD EAX,EBX instruction 3 instruction 4 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI FO EI WO CO DI time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 idle idle idle
Outline
Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards
Resource Data Control
Control Hazards
JNZ ADDRESS ADD EAX,EBX instruction 3 instruction 4 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 JNZ ADDRESS ADD EAX,EBX instruction 3 instruction 4 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO time 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Also called branch hazard. What is the address of the instruction following a conditional branch from? Known only here no memory conflict!
idle
Dealing with Branches
Multiple Streams
Two pipelines prefetch each branch into a separate
pipeline (IBM 370/168 and IBM 3033). Always one pipeline produces no useful work.
Prefetch Branch Target
Target of branch is prefetched in addition to instructions
following branch; keep target until branch is executed (IBM 360/91)
Loop buffer
Very fast memory maintained by fetch stage of pipeline.
Check buffer before fetching from memory (CRAY-1)
Branch prediction Delayed branching
Control Hazards
Concept:
Instead of delaying the fetch of next instruction, it is
predicted
Results are stored in temporary registers If prediction correct, make results definitive If prediction incorrect, flush results, and restart
fetching from the right address
Branch Prediction
Static Methods:
Predict “never taken” or “always taken” Predicted by opcode
There are two codes for each branch instruction → 1 bit
indicates “predict taken” or “predict not taken”
Compiler analyses the code, guesses and generates the
appropriate branch code.
Processor follows compiler suggestion Implies in code incompatibility with previous processors
Branch Prediction
Branch Prediction
Dynamic Methods
Based on recent branch history
branch instruction address target address state Branch History Table not taken predict taken not taken predict taken not taken taken taken predict not taken predict not taken taken not taken taken Branch Prediction State Diagram
Delayed Branch
Concept
The branch takes effect only after the execution of the following instruction reduces the branch penalty
MOV EDX,ECX ADD EAX,[EBX] JZ LA instruction LA: instruction
...
MOV EDX,ECX ADD EAX,[EBX] JZ LA instruction LA: instruction conventional branch
... ...
not taken taken ADD EAX,[EBX] JZ LA MOV EDX,ECX instrução LA: instrução
... ...
ADD EAX,[EBX] JZ LA MOV EDX,ECX instruction LA: instruction
...
always executed
delayed branch not taken taken
FO DI FI
Delayed Branch
Example: conventional branch Prediction wrong
ADD EAX,[EBX] MOV EDX,ECX JZ LA
instruction 1 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI EI WO time FI DI FI FI CO FO DI EI WO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 instruction 2 instruction 3 instruction 4
branch penalty
FI FI
Delayed Branch
Example: delayed branch Prediction wrong
FI DI
ADD EAX,[EBX] MOV EDX,ECX JZ LA
instruction 1 FI CO FO DI EI WO FI CO FO DI EI WO FI CO FO DI FI DI EI WO time FI CO FO DI EI WO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 instruction 2 instruction 3 instruction 4
branch penalty
Exercises
PROGRAM 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ADD EAX,[EBX+ESI] INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768]Exercise 1: Assume the pipeline shown in slide 7 containing 6 stages. Complete the graphs below that represent the pipeline operation assuming a single port memory. Hint: take in consideration the memory accesses for instruction fetch, operand fetch and result write.
PROGRAM 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ADD [EBX+ESI], EAXINC EBX DEC [ESI*2+EBP] MOV CX,[ 4768]
Exercises
PROGRAM 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ADD EAX,[EBX+ESI] INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768]Exercise 2: Repeat the previous exercise assuming that there are separate Instruction and Data caches, so that accesses to instruction and operands may
PROGRAM 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ADD [EBX+ESI], EAXINC EBX DEC [ESI*2+EBP] MOV CX,[ 4768]
Pipelining