now handout page 1
play

NOW Handout Page 1 Hazard Resolution Example Structural Add r1 := - PDF document

Review Data stationary pipeline control EECS 252 Graduate Computer Micro-instruction & PC track down the pipe Architecture Accumulate state Implementing bubbles, stalls, forwarding, multicycle operations Branch prediction


  1. Review • Data stationary pipeline control EECS 252 Graduate Computer – Micro-instruction & PC track down the pipe Architecture – Accumulate state • Implementing bubbles, stalls, forwarding, multicycle operations • Branch prediction Lec 5 – Out-of-Order Completion – Static vs dynamic – N-bit saturating counters – Local and global history David Culler – Correlated predictors, Tournament, GSHARE Electrical Engineering and Computer Sciences – Branch target buffers, return address predictors University of California, Berkeley http://www.eecs.berkeley.edu/~culler http://www-inst.eecs.berkeley.edu/~cs252 2/1/2005 CS252 SP05, Lec 5 OOC 2 Outline Pipelining with Reg. Reservations • Assumptions • Relax pipeline design to allow out-of-order completions 1. Multiple pipelined function units of different latency » able to accept operations at issue rate – Cray-1: register reservations » may be exceptions (e.g., divide) • Relax pipeline to allow out-of-order issue 2. Issue instructions in order – CDC 6600: Scoreboard 3. Operand fetch in order • Compiler optimizations for ILP 4. Completion out of order » short ops may bypass long ones • Superscalar issue 5. Some shared resources (e.g., reg write port) • Maybe Go back and finish exceptions • Implications – WAR hazard still resolved by pipeline flow (2 & 3) – RAW, WAW, and structural still present • Design philosophy (ala Cray) – Resolve hazards as instruction is issued into pipeline – Pipeline is non-blocking 2/1/2005 CS252 SP05, Lec 5 OOC 3 2/1/2005 CS252 SP05, Lec 5 OOC 4 Resolving Structural Hazards Basic Issue Model • With static pipeline flow, resource usage is known in • Issue unit checks for all advance hazards Instr. Fetch • Instruction requires X at t ticks after issue – Structural RAW, WAW • If reservation X [t] is clear, issue inst and set bit • Holds issue while hazards • Otherwise, delay till clear exist Op Fetch & Issue • At each tick the reservation X [] shifts by one, so will • Upon issue, register values eventually clear provided to F.U • Multiple resources? Range of delays? op valA valB rD • Executes to completion “shift reg.” for resource X without blocking Delay till required NOW resource resource is used CS252 SP05, Lec 5 OOC 5 CS252 SP05, Lec 5 OOC 6 2/1/2005 2/1/2005 NOW Handout Page 1

  2. Hazard Resolution Example • Structural Add r1 := r2 + r3 Instr. Fetch Instr. Fetch – Op code => resource usage Add r2 := r2 + 4 – Check resource resv Lod r5 := mem[r1+16] – Set on issue Lod r6 := mem[r1+32] • Data Op Fetch Op Fetch & Issue & Issue Mul r7 := r5 * r6 – Add reservation bit one each register Bnz r1, foo – Check RegRsv for op valA valB rD op valA valB rD Sub r7 := r0 – r0 source and destination registers – Hold issue till clear – Set bit on destination register – Clear bit on dest reg. Write • Questions: – Forwarding? Motorola 88000 “scoreboard” [sic] 2/1/2005 CS252 SP05, Lec 5 OOC 7 2/1/2005 CS252 SP05, Lec 5 OOC 8 Cray-1 Discussion Pipelining with Scoreboarding • Assumptions • Technological Assumptions 1. Multiple function units of different latency • Why no forwarding? – Especially non-pipelined units • Longevity of the ISA? 2. Issue instructions whenever FU available, unless would cause multiple outstanding writes to same regsiter • Instruction cache? – Operand fetch out of order – Four blocks (RR) of 16x4 “parcels” – Completion out of order – Issue delayed on miss 3. Some shared resources (e.g., reg write port) » 2 CP for change of block • Implications • Branch delays? – Need to resolve RAW, WAR, WAW and structural • Design philosophy (ala CDC 6600) – Brach op code delayed till second parcel is obtained – 5 clocks (reg zero, nz, pos, neg) – Issue unit tracks all outstanding dependences – Holds issue if structural or WAW hazard • I/O system? – Informs FUs when hazards resolved – FUs fetch operands from register file and proceed 2/1/2005 CS252 SP05, Lec 5 OOC 9 2/1/2005 CS252 SP05, Lec 5 OOC 10 Scoreboard Operation Example • Issue Add r1 := r2 + r3 Instr. Fetch Instr. Fetch – Hold while FU unavailable or Add r2 := r2 + 4 destination register reserved (by FU f ) Lod r5 := mem[r1+16] • Read operands Scoreboard Scoreboard FU FU Lod r6 := mem[r1+32] Issue & Issue & – SB informs FU with all sources Resolve Resolve available to fetch & go Mul r7 := r5 * r6 – Limited by read ports Bnz r1, foo Sub r7 := r0 – r0 op rA rB rD op fetch op fetch op fetch op fetch op ex ex valA valB rD • Write back – SB schedules one FU to write – Waits no FU waiting to fetch (old version) of reg CS252 SP05, Lec 5 OOC 11 CS252 SP05, Lec 5 OOC 12 2/1/2005 2/1/2005 NOW Handout Page 2

  3. Discussion Case Study: MIPS R4000 (200 MHz) IF IS RF EX DF DS TC WB • Technological Assumptions ALU reg instr mem reg data mem • Extend to allow forwarding? • How do loads and stores work? • 8 Stage Pipeline: • Instruction cache? – IF–first half of fetching of instruction; PC selection happens here as well as initiation of instruction cache access. • I/O system? – IS–second half of access to instruction cache. – RF–instruction decode and register fetch, hazard checking and also instruction cache hit detection. – EX–execution, which includes effective address calculation, ALU operation, and branch target computation and condition evaluation. – DF–data fetch, first half of access to data cache. – DS–second half of access to data cache. – TC–tag check, determine whether the data cache access hit. – WB–write back for loads and register-register operations. • 8 Stages: What is impact on Load delay? Branch delay? Why? 2/1/2005 CS252 SP05, Lec 5 OOC 13 2/1/2005 CS252 SP05, Lec 5 OOC 14 Case Study: MIPS R4000 MIPS R4000 Floating Point IF IS RF EX DF DS TC WB TWO Cycle • FP Adder, FP Multiplier, FP Divider IF IS RF EX DF DS TC Load Latency IF IS RF EX DF DS • Last step of FP Multiplier/Divider uses FP Adder HW IF IS RF EX DF • 8 kinds of stages in FP units: IF IS RF EX IF IS RF Stage Functional unit Description IF IS A FP adder Mantissa ADD stage IF D FP divider Divide pipeline stage IF IS RF EX DF DS TC WB THREE Cycle E FP multiplier Exception test stage IF IS RF EX DF DS TC Branch Latency M FP multiplier First stage of multiplier IF IS RF EX DF DS (conditions evaluated N FP multiplier Second stage of multiplier IF IS RF EX DF during EX phase) R FP adder Rounding stage IF IS RF EX Delay slot plus two stalls IF IS RF S FP adder Operand shift stage Branch likely cancels delay slot if not taken IF IS U Unpack FP numbers IF 2/1/2005 CS252 SP05, Lec 5 OOC 15 2/1/2005 CS252 SP05, Lec 5 OOC 16 R4000 Performance MIPS FP Pipe Stages • Not ideal CPI of 1: – Load stalls (1 or 2 clock cycles) FP Instr 1 2 3 4 5 6 7 8 … – Branch stalls (2 cycles + unfilled slots) Add, Subtract U S+A A+R R+S – FP result stalls: RAW data hazard (latency) Multiply U E+M M M M N N+A R – FP structural stalls: Not enough FP hardware (parallelism) 4.5 Divide U A R D 28 … D+A D+R, D+R, D+A, D+R, A, R 4 Square root U E (A+R) 108 … A R 3.5 Negate U S 3 Absolute value U S 2.5 FP compare U A R 2 Stages: 1.5 M First stage of multiplier A Mantissa ADD stage 1 N Second stage of multiplier D Divide pipeline stage 0.5 R Rounding stage E Exception test stage 0 doduc espresso gcc nasa7 ora S Operand shift stage eqntott li spice2g6 su2cor tomcatv U Unpack FP numbers Base Load stalls Branch stalls FP result stalls FP structural stalls CS252 SP05, Lec 5 OOC 17 CS252 SP05, Lec 5 OOC 18 2/1/2005 2/1/2005 NOW Handout Page 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend