1
play

1 The Hardware: Reorder Buffer Branch Prediction vs. Precise - PDF document

Control Dependencies Every instruction is control dependent on some set of branches Lecture 7: Speculative Execution and if p1 Recovery S1; if p2 Branch prediction and speculative S2; execution, precise interrupt, reorder S1 is control


  1. Control Dependencies Every instruction is control dependent on some set of branches Lecture 7: Speculative Execution and if p1 Recovery S1; if p2 Branch prediction and speculative S2; execution, precise interrupt, reorder S1 is control dependent on p1 , and S2 is buffer control dependent on p2 but not on p1 . control dependencies must be preserved to preserve program order 1 2 Control Dependence Ignored Branch Prediction and Speculative Execution Speculation is to run Example: If CPU stalls on branches, how much would instructions on CPI increase? for (i=0; i<1000; i++) prediction – predictions C[i] = A[i]+B[i]; could be wrong . Control dependence need not be preserved Branch prediction: in the whole execution Branch prediction: predict the execution � willing to execute instructions that should not cannot be avoided, could as accurate as possible have been executed, thereby violating the be very accurate (frequent cases) control dependences, if can do so without Speculative execution affecting correctness of the program recovery: if prediction Mis-prediction is less Two properties critical to program is wrong, roll the frequent event – but correctness are data flow and exception execution back can we ignore? behavior 3 4 Exception Behavior Precise Interrupts Preserving exception behavior -- exceptions Tomasulo had: must be raised exactly as in sequential execution � Same sequences In-order issue, out-of-order execution, � No “extra” exceptions and out-of-order completion Example: DADDU R2,R3,R4 BEQZ R2,L1 Need to “fix” the out-of-order LW R1,0(R2) L1: completion aspect so that we can find Problem with moving LW before BEQZ ? precise breakpoint in instruction Again, a dynamic execution must look like a stream. sequential execution, any time when it is stopped 5 6 1

  2. The Hardware: Reorder Buffer Branch Prediction vs. Precise Interrupt If inst write results in program Mis-prediction is Same technique for order, reg/memory always get IM the correct values “exception” on the handling both issue: Fetch Unit branch inst in-order completion or Reorder buffer (ROB) – reorder out-of-order inst to program commit: change order at the time of writing Reorder reg/memory (commit) register/memory Decode Rename Regfile Buffer only in program Execution “branches If some inst goes wrong, handle order (sequential) it at the time of commit – just out” on exceptions flush inst afterwards � Every instruction is S-buf L-buf RS RS “predicted” not to How does it ensure Inst cannot write reg/memory DM immediately after execution, so take the “branch” to FU1 FU2 the correctness? ROB also buffer the results interrupt handler No such a place in Tomasulo original 7 8 Reorder Buffer Details Four Steps of Speculative Tomasulo Algorithm Holds branch valid and exception Program Counter Branch or L/W? bits 1. Issue—get instruction from FP Op Queue � Flush pipeline when any bit is set Exceptions? If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage � How do the architectural states Dest reg sometimes called “dispatch”) look like after the flushing? Ready? Result 2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch Holds dest, result and PC CDB for result; when both in reservation station, execute; � Write results to dest at the checks RAW (sometimes called “issue”) time of commit 3. Write result—finish execution (WB) Reorder Buffer � Which PC to hold? Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available. � A ready bit (not shown) indicates if the 4. Commit—update register with reorder result When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from Supplies operands between reorder buffer. Mispredicted branch flushes reorder buffer execution complete and commit (sometimes called “graduation”) 9 10 Speculative Execution Recovery Changes to Other Components Flush the pipeline on Use ROB index as tag IM � Why not RS index any more? mis-prediction Fetch Unit � Why is ROB index a valid choice? � MIPS 5-stage Renaming table maps architecture registers to ROB pipeline used index if the register is renamed flushing on taken Reorder Reservation stations now use ROB index for tracking Decode Rename Regfile Buffer branches dependence and for wakeup Where is the flush Again tag (now ROB index) and data are broadcast signal from? on CDB at writeback Inst may receive values from reg/mem, data When to flush? S-buf L-buf RS RS broadcasting, or ROB Which components DM FU1 FU2 are flushed? 11 12 2

  3. Summary Code Example Reservations stations: implicit register renaming to larger set of registers + buffering source operands Loop: LD R2, 0(R1) � Prevents registers as bottleneck DADDIU R2, R2, #1 � Avoids WAR, WAW hazards of Scoreboard Not limited to basic blocks when compared to static scheduling SD R2, 0(R1) (integer units gets ahead, beyond branches) DADDIU R1, R1, #4 Today, helps cache misses as well � Don’t stall for L1 Data cache miss (insufficient ILP for L2 miss?) BNE R2, R3, Loop � Can support memory-level parallelism How would this code be executed? Lasting Contributions � Dynamic scheduling Inst Issue Exec Memory Write Commit � Register renaming read � Load/store disambiguation (discuss later) results 360/91 descendants are Pentium III; PowerPC 604; MIPS LD 1 2 3 4 5 R10000; HP-PA 8000; Alpha 21264 … … … … … … … … … … … … 13 14 Dynamic Scheduling: The Only Choice? Most high-performance processors today are dynamically scheduled superscalar processors � With deeper and n-way issue pipeline Other alternatives to exploit instruction-level parallelism � Statically scheduled superscalar � VLIW Mixed effort: EPIC – Explicit Parallel Instruction Computing � Example: Intel Itanium processors Why is dynamic scheduling so popular today? � Technology trends: increasing transistor budget, deeper pipeline, wide issue 15 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend