1
play

1 What Limits Performance? Stalls (Data Hazards) Data hazards - PowerPoint PPT Presentation

Instruction Scheduling Background: Pipelining Basics Last time Idea Register allocation Begin executing an instruction before completing the previous one Today Without Pipelining With Pipelining Instruction scheduling The


  1. Instruction Scheduling Background: Pipelining Basics Last time Idea – Register allocation – Begin executing an instruction before completing the previous one Today Without Pipelining With Pipelining – Instruction scheduling – The problem: Pipelined computer architecture time time – A solution: List scheduling Instr 0 Instr 0 instructions instructions Instr 1 Instr 1 Instr 2 Instr 2 Instr 3 Instr 3 Instr 4 Instr 4 CS553 Lecture Instruction Scheduling 1 CS553 Lecture Instruction Scheduling 2 Idealized Instruction Data-Path Pipelining Details Instructions go through several stages of execution Observations – Individual instructions are no faster (but throughput is higher) Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 – Potential speedup determined by number of stages (more or less) Instruction Instruction Memory Register – Filling and draining pipe limits speedup Decode & Execute ⇒ ⇒ ⇒ ⇒ Fetch Access Write-back – Rate through pipe is limited by slowest stage Register Fetch – Less work per stage implies faster clock IF ID/RF EX MEM WB ⇒ ⇒ ⇒ ⇒ Modern Processors time – Long pipelines: 5 (Pentium), 14 (Pentium Pro), 22 (Pentium 4) instructions – Issue 2 (Pentium), 4 (UltraSPARC) or more (dead Compaq EV8) IF ID EX MM WB instructions per cycle IF ID EX MM WB – Dynamically schedule instructions (from limited instruction window) IF ID EX MM WB or statically schedule ( e.g ., IA-64) IF ID EX MM WB – Speculate IF ID EX MM WB – Outcome of branches IF ID EX MM WB – Value of loads (research) CS553 Lecture Instruction Scheduling 3 CS553 Lecture Instruction Scheduling 4 1

  2. What Limits Performance? Stalls (Data Hazards) Data hazards Code – Instruction depends on result of prior instruction that is still in the pipe // $r1 is the destination add $r1,$r2,$r3 mul $r4,$r1,$r1 // $r4 is the destination Structural hazards Pipeline picture – Hardware cannot support certain instruction sequences because of limited hardware resources time instructions IF ID EX MM WB Control hazards IF ID EX MM WB – Control flow depends on the result of branch instruction that is still in the pipe An obvious solution – Stall (insert bubbles into pipeline) CS553 Lecture Instruction Scheduling 5 CS553 Lecture Instruction Scheduling 6 Stalls (Structural Hazards) Stalls (Control Hazards) Code Code // Suppose multiplies take two cycles // if $r1==0 , branch to label mul $r1,$r2,$r3 bz $r1, label mul $r4,$r5,$r6 add $r2,$r3,$r4 Pipeline Picture Pipeline Picture time time instructions instructions IF ID EX EX MM WB IF ID EX MM WB IF ID EX EX MM WB IF ID EX MM WB CS553 Lecture Instruction Scheduling 7 CS553 Lecture Instruction Scheduling 8 2

  3. Hardware Solutions Instruction Scheduling for Pipelined Architectures Data hazards Goal – Data forwarding (doesn’t completely solve problem) – An efficient algorithm for reordering instructions to minimize pipeline – Runtime speculation (doesn’t always work) stalls Structural hazards Constraints – Hardware replication (expensive) – Data dependences (for correctness) – More pipelining (doesn’t always work) – Hazards (can only have performance implications) Control hazards Possible Simplifications – Runtime speculation (branch prediction) – Do scheduling after instruction selection and register allocation – Only consider data hazards Dynamic scheduling – Can address all of these issues – Very successful CS553 Lecture Instruction Scheduling 9 CS553 Lecture Instruction Scheduling 10 Data Dependences Register Renaming Data dependence Idea – A data dependence is an ordering constraint on 2 statements – Reduce false data dependences by reducing register reuse – When reordering statements, all data dependences must be observed to – Give the instruction scheduler greater freedom preserve program correctness Example True (or flow) dependences add $r1, $r2, 1 add $r1, $r2, 1 st $r1, [$fp+52] st $r1, [$fp+52] – Write to variable x followed by a read of x (read after write or RAW) mul $r1, $r3, 2 mul $r11, $r3, 2 x = 5; st $r1, [$fp+40] st $r11, [$fp+40] print (x); Anti-dependences – Read of variable x followed by a write (WAR) add $r1, $r2, 1 print (x); mul $r11, $r3, 2 x = 5; false Output dependences st $r1, [$fp+52] dependences – Write to variable x followed by x = 6; st $r11, [$fp+40] x = 5; another write to x (WAW) CS553 Lecture Instruction Scheduling 11 CS553 Lecture Instruction Scheduling 12 3

  4. Phase Ordering Problem List Scheduling [Gibbons & Muchnick ’86] Register allocation Scope – Tries to reuse registers – Basic blocks – Artificially constrains instruction schedule Assumptions – Pipeline interlocks are provided ( i.e., algorithm need not introduce no-ops) Just schedule instructions first? – Pointers can refer to any memory address ( i.e., no alias analysis) – Scheduling can dramatically increase register pressure – Hazards take a single cycle (stall); here let’s assume there are two... – Load immediately followed by ALU op produces interlock Classic phase ordering problem – Store immediately followed by load produces interlock – Tradeoff between memory and parallelism Main data structure: dependence DAG Approaches – Nodes represent instructions – Consider allocation & scheduling together – Edges (s 1 ,s 2 ) represent dependences between instructions – Instruction s 1 must execute before s 2 – Run allocation & scheduling multiple times – Sometimes called data dependence graph or data-flow graph (schedule, allocate, schedule) CS553 Lecture Instruction Scheduling 13 CS553 Lecture Instruction Scheduling 14 Dependence Graph Example Scheduling Heuristics dst src src Sample code Dependence graph Goal – Avoid stalls 1 addi $r2,1,$r1 1 2 3 2 addi $sp,12,$sp 1 1 2 3 st a, $r0 Consider these questions 4 ld $r3,-4($sp) 4 1 5 8 – Does an instruction interlock with any immediate successors in the 5 ld $r4,-8($sp) 2 dependence graph? IOW is the delay greater than 1? 2 1 2 6 addi $sp,8,$sp – How many immediate successors does an instruction have? 6 9 7 st 0($sp),$r2 – Is an instruction on the critical path? 8 1 ld $r5,a 9 addi $r4,1,$r4 7 Hazards in current schedule (3,4), (5,6), (7,8), (8,9) Any topological sort is okay, but we want best one CS553 Lecture Instruction Scheduling 15 CS553 Lecture Instruction Scheduling 16 4

  5. Scheduling Heuristics (cont) Scheduling Algorithm Idea: schedule an instruction earlier when... Build dependence graph G Candidates ← set of all roots (nodes with no in-edges) in G – It does not interlock with the previously scheduled instruction (avoid stalls) while Candidates ≠ ∅ – It interlocks with its successors in the dependence graph Select instruction s from Candidates {Using heuristics—in order} (may enable successors to be scheduled without stall) Schedule s – It has many successors in the graph Candidates ← Candidates − s (may enable successors to be scheduled with greater flexibility) Candidates ← Candidates ∪ “exposed” nodes – It is on the critical path {Add to Candidates those nodes whose (the goal is to minimize time, after all) predecessors have all been scheduled} CS553 Lecture Instruction Scheduling 17 CS553 Lecture Instruction Scheduling 18 Scheduling Example Scheduling Example (cont) Original code Dependence Graph Scheduled Code 3 st a, $r0 1 addi $r2,1,$r1 3 st a, $r0 1 addi 2 3 st addi 2 addi $sp,12,$sp 2 addi $sp,12,$sp 2 addi $sp,12,$sp 1 1 2 5 ld $r4,-8($sp) 3 st a, $r0 5 ld $r4,-8($sp) 1 4 ld $r3,-4($sp) 4 ld $r3,-4($sp) 4 ld $r3,-4($sp) 4 5 ld 8 ld ld 8 ld $r5,a 5 ld $r4,-8($sp) 8 ld $r5,a 1 2 2 2 1 addi $r2,1,$r1 6 addi $sp,8,$sp 1 addi $r2,1,$r1 6 9 addi addi 6 addi $sp,8,$sp 7 st 0($sp),$r2 6 addi $sp,8,$sp 1 7 8 st 0($sp),$r2 ld $r5,a 7 st 0($sp),$r2 9 9 addi $r4,1,$r4 addi $r4,1,$r4 9 addi $r4,1,$r4 7 st Candidates Hazards in new schedule Hazards in original schedule Hazards in new schedule 1 addi $r2,1,$r1 (8,1) (3,4), (5,6), (7,8), (8,9) (8,1) 5 2 addi ld $sp,12,$sp $r4,-8($sp) 4 7 3 st ld st 0($sp),$r2 $r3,-4($sp) a, $r0 8 ld $r5,a 9 addi $r4,1,$r4 6 addi $sp,8,$sp CS553 Lecture Instruction Scheduling 19 CS553 Lecture Instruction Scheduling 20 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend