P age 1
CS252/ Culler Lec 15. 1 3/ 12/ 02
CS252 Graduate Computer Architecture
Lecture 15: I nstruction Level Parallelism and Dynamic Execution
March 11, 2002 Prof . David E. Culler Comput er Science 252 Spring 2002
CS252/ Culler Lec 15. 2 3/ 12/ 02
Recall f rom Pipelining Review
- Pipeline CPI = I deal pipeline CPI + St ruct ural
St alls + Dat a Hazard St alls + Cont rol St alls
– I deal pipeline CPI : measure of the maximum perf ormance attainable by the implementation – Structural hazards: HW cannot support this combination of inst ruct ions – Data hazards: I nstruction depends on result of prior instruction still in the pipeline – Control hazards: Caused by delay between the f etching of instructions and decisions about changes in control f low (branches and jumps)
CS252/ Culler Lec 15. 3 3/ 12/ 02
Recall Data Hazard Resolution: I n- order issue, in- order completion
Time (clock cycles)
- r r8, r2,r9
I n s t r. O r d e r
lw r1, 0(r2) sub r4,r1,r6 and r6,r2,r7
Reg ALU DMem I fetch Reg Reg I fetch ALU DMem Reg
Bubble
I fetch ALU DMem Reg
Bubble
Reg I fetch ALU DMem
Bubble
Reg
Ext end t o Mult iple inst ruct ion issue? What if load had longer delay? Can and issue?
CS252/ Culler Lec 15. 4 3/ 12/ 02
I n- Order I ssue, Out - of - order Completion
- Which hazards are present? RAW? WAR? WAW?
- load
r3 <- r1, r2
- add
r1 <- r5, r2
- sub
r3 <- r3, r1 or r3 <- r2, r1
- Register Reservations
– when issue mark dest inat ion regist er busy t ill complet e – check all regist er reservat ions bef ore issue
Reg ALU I fetch Reg Add DMem Reg DMem’ CS252/ Culler Lec 15. 5 3/ 12/ 02
I deas to Reduce Stalls
Technique Reduces Dynamic schedulin g Dat a hazar d st alls Dynamic br anch pr edict ion Cont rol st alls I ssuing mult iple inst r uct ions per cycle I deal CP I Specula t ion Dat a and cont r ol st alls Dynamic memory disambiguat ion Dat a hazar d st alls involving memor y Loop unr olling Cont rol hazar d st alls Basic compiler p ipel ine sch e duling Dat a hazar d st alls Compiler dependence analysis I deal CP I and da t a hazar d st alls Sof t ware pipelining and t race scheduling I deal CP I and da t a hazar d st alls Compiler speculat ion I deal CP I , dat a and cont r ol st alls
Chapter 3 Chapter 4
CS252/ Culler Lec 15. 6 3/ 12/ 02
I nstruction- Level Parallelism (I LP)
- Basic Block (BB) I LP is quite small
– BB: a straight- line code sequence wit h no branches in except to the entry and no branches out except at the exit – average dynamic branch f requency 15% to 25% => 4 to 7 instructions execute between a pair of branches – Plus instructions in BB likely to depend on each other
- To obt ain subst ant ial perf ormance enhancement s,
we must exploit I LP across mult iple basic blocks
- Simplest: loop- level parallelism t o exploit
parallelism among it erat ions of a loop
– Vect or is one way – I f not vector, then either dynamic via branch prediction or static via loop unrolling by compiler