Appendix A Appendix A
Pipelining: Basic and Intermediate Concepts p
1
Appendix A Appendix A Pipelining: Basic and Intermediate Concepts - - PowerPoint PPT Presentation
Appendix A Appendix A Pipelining: Basic and Intermediate Concepts p 1 Overview Basics of Pipelining B i f Pi li i Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipelining + Exceptions Pipeline
1
2
P ro g ra m
ALU operation = 2 nsec, Register file access = 1 nsec;
In s tru c tio n D a ta
T im e 2 4 6 8 1 0 1 2 1 4 1 6 1 8 P ro g ra m e x e c u tio n
(in in s tr u c tio n s )
fe tc h R e g A L U a c c e s s R e g
8 n s
In s tru c tio n fe tc h R e g A L U D a ta a c c e s s R e g
ld r 1 , 1 0 0 (r 4 ) ld r 2 , 2 0 0 (r 5 ) 8 n s
In s tru c tio n fe tc h
8 n s ld r 3 , 3 0 0 (r 6 )
. ..
g y ( ), y instruction needs 4 clock cycles (i.e. 8 nsec) to execute.
24 nsec). CPI = 12 cycles/3 instructions= 4 cycles / instruction.
3
sec). C cyc es/3 st uct o s cyc es / st uct o .
4
Time
T a
a s k
O r d e
e r
5
Time
T a s k
k O r
d e r
6
7
8
5 ns 4 ns 5 ns 10 ns 4 ns
9
5 ns 4 ns 5 ns 10 ns 4 ns
WB lat MEM lat EX lat ID lat IF lat L ) ( ) ( ) ( ) ( ) ( + + + + =
10
ns ns ns ns ns ns 28 4 10 5 4 5 = + + + + =
5 ns 4 ns 5 ns 10 ns 4 ns
IF MEM ID I1 L(I1) = 28ns EX WB MEM ID IF I2 L(I2) = 33ns EX WB MEM ID IF I3 L(I3) = 38ns EX WB MEM ID IF I4 EX WB MEM ID IF I4 L(I5) = 43ns EX WB We are in trouble! The latency is not constant. This happens because this is an unbalanced
11
the same length as the longest one.
T Time
a s k
O r d
e r
12
13
Depth of the pipeline
Depth of the pipeline
14
IF ID EX MEM WB IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB IF ID EX MEM WB
15
16
17
– Capacitance-charge-discharge rates
R t d t d i t h dl f t bl – Repeaters used to drive current, handle fan-out problems
– Time to charge/discharge adds to delay – Dominant problem in old integration densities.
– Problem with this approach is power requirements go up – Power dissipation becomes a problem.
– Speed-of-light propagation delays p g p p g y
much lower.
consume a large part of the clock cycle)
18
g p y )
19
CPI
pipelined
= Ideal CPI
pipelined
+ Pipeline stall clock cycles per instr
Speedup = Ideal CPI x Pipeline depth Clock Cycle unpipelined Ideal CPI + Pipeline stall per instr Clock Cycle
pipelined
x
Speedup = Pipeline depth Clock Cycle unpipelined 1 + Pipeline stall CPI Clock Cycle
pipelined
x
20
21
22
23
M1 M2 M3 M4 M5
FP M l i l
IF ID EX
MEM
WB
FP Multiply
IF ID M1 M2 M3 M4 M5
MEM
WB
FP Multiply
IF ID EX
MEM
WB
FP Multiply
IF ID M1 M2 M3 M4 M5
MEM
WB
FP Multiply
24
IF ID EX
MEM
WB
p y
25
implementation has a 1.05 times faster clock rate
/ / SpeedUpA = Pipeline Depth/(1 + 0) x (clock
unpipe/clockpipe)
= Pipeline Depth SpeedUpB = Pipeline Depth/(1 + 0.4 x 1) x (clock /(clock / 1 05) x (clockunpipe/(clockunpipe / 1.05) = (Pipeline Depth/1.4) x 1.05 = 0.75 x Pipeline Depth SpeedUp / SpeedUp = Pipeline Depth/(0 75 x Pipeline Depth) = 1 33
26
25
SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33
27
28
I
J
29
30
31
32
33
34
Slow code: LW Rb,b LW R
LW Rc,c ADD Ra,Rb,Rc SW a,Ra
LW Re,e LW Rf,f SUB Rd,Re,Rf
35
SW d,Rd
LW Rb,b IF ID EX MEM WB LW Rc,c IF ID EX MEM WB ADD Ra,Rb,Rc IF ID EX MEM WB SW a,Ra IF ID EX MEM WB LW Re,e IF ID EX MEM WB LW Rf,f IF ID EX MEM WB LW Rf,f IF ID EX MEM WB SUB Rd,Re,Rf IF ID EX MEM WB SW d,Rd IF ID EX MEM WB LW Rb,b IF ID EX MEM WB LW Rc,c IF ID EX MEM WB LW Re,e IF ID EX MEM WB LW Re,e IF ID EX MEM WB ADD Ra,Rb,Rc IF ID EX MEM WB LW Rf,f IF ID EX MEM WB SW a,Ra IF ID EX MEM WB SUB Rd R Rf IF ID EX MEM WB
36
SUB Rd,Re,Rf IF ID EX MEM WB SW d,Rd IF ID EX MEM WB
37
Branch IF ID EX MEM WB Branch successor IF stall stall IF ID EX MEM WB Branch successor+1 IF ID EX MEM WB Branch successor+2 IF ID EX MEM WB Branch successor+3 IF ID EX MEM Branch successor+4 IF ID EX
38
39
40
y g
41
Untaken Branch IF ID EX MEM WB Instruction i+1 IF ID EX MEM WB Instruction i 1 IF ID EX MEM WB Instruction i+1 IF ID EX MEM WB Instruction i+2 IF ID EX MEM WB Instruction i+3 IF ID EX MEM WB Taken Branch IF ID EX MEM WB Instruction i+1 IF stall stall stall stall (clear the IF/ID register) Branch target IF ID EX MEM WB Branch target+1 IF ID EX MEM WB Branch target+2 IF ID EX MEM WB
Compiler organizes code so that the most frequent path is the not-taken one
42
Untaken Branch IF ID EX MEM WB Instruction i+1 IF stall stall stall stall (clear the IF/ID register) Instruction i+2 IF ID EX MEM WB
Taken Branch IF ID EX MEM WB Instruction i+3 IF ID EX MEM WB Instruction i+4 IF ID EX MEM WB a e a c W Instruction i+1 IF ID EX MEM WB Branch target IF ID EX MEM WB Branch target i+1 IF ID EX MEM WB Branch target i+2 IF ID EX MEM WB
43
Branch target i+2 IF ID EX MEM WB
44
From before From target From fall th h
before g through
45
) F b f B h t t d d d l d Al a) From before Branch must not depend on delayed Always instruction b) From target Must be OK to execute delayed When branch is taken instruction if branch is not taken instruction if branch is not taken c) From fall Must be OK to execute delayed When branch is not taken through instruction if branch is taken
46
47
48
If branch is almost always taken If branch is almost never taken
always taken
49
5 6 10 11 15 16 31
5 6 10 11 15 16 31 20 21
5 6 31
50
51
52
53
54
55
56
57
DADD R5, R6, R7 DSUB R8, R6, R7 OR R9, R6, R7 , ,
DADD R5, R1, R7 DSUB R8, R6, R7 OR R9, R6, R7
DADD R5, R6, R7 DSUB R8, R1, R7 OR R9, R6, R7
DADD R5, R6, R7 DSUB R8, R6, R7 OR R9,R1, R7 58
ALU
IM Reg DM Reg
LW R1, 0(R2) ALU
IM Reg DM Reg
SUB R4, R1, R5 ALU
IM Reg DM
AND R6, R1, R7 ALU
IM Reg
OR R8 R1 R9 OR R8, R1, R9
LW R1, 0(R2) IF ID EX MEM WB SUB R4, R1, R5 IF ID stall EX MEM WB
59
AND R6, R1, R7 IF stall ID EX MEM WB OR R8, R1, R9 stall IF ID EX MEM WB
ID/EX.IR 0..5 IF/ID.IR 0..5 Comparison L d ALU ID/EX IR IF/ID IR Load r-r ALU ID/EX.IR[RT] == IF/ID.IR[RS] Load r-r ALU ID/EX.IR[RT] == IF/ID.IR[RT] Load Load, Store, r-i ALU, branch ID/EX.IR[RT] == IF/ID.IR[RS]
[RT] [RS]
60
61
62
63
64
65
66
67
68
IF ID EX WB
CPU
Complete
Cache IF ID EX WB
Memory IF ID EX WB
Suspend Execution
Memory Disk IF ID EX WB
IF ID EX WB
T dd Disk IF ID EX WB
Trap addr IF ID EX WB
Exception handling d
69
procedure
70
71
IF ID EX WB
IF ID EX WB
IF ID EX WB M
IF ID EX WB
IF ID EX WB
72
IF ID EX WB
Exception Status Vector Check exceptions here
73
74
75
M1 M2 M3 M4 M5 M6 M7 Mem WB ID IF A1 A2 A3 A4 Mem WB ID IF EX M WB ID IF
76
EX Mem WB ID IF EX Mem WB ID IF
EX Mem WB ID IF
LD F4, 0(R2)
M1 M2 M3 M4 M5 M6 M7 Mem WB ID IF stall
, ( ) MULTD F0, F4, F6 77
A1 A2 A3 A4 Mem WB ID IF stall stall stall stall stall stall stall
ADD F2, F0, F8
78
79
80