now handout page 1
play

NOW Handout Page 1 Multicycle stages Historical Perspective: - PDF document

Debate Review: ISA -a Critical Interface EECS 252 Graduate Computer Extremely well defined abstraction Architecture software Huge, quantitative base of usage data for real applications Prog. Lang. Compiler Operating Systems


  1. Debate Review: ISA -a Critical Interface EECS 252 Graduate Computer • Extremely well defined abstraction Architecture software • Huge, quantitative base of usage data for real applications Prog. Lang. Compiler Operating Systems filtered through SOA compiler technology instruction set Lec 4 – Issues in Basic Pipelines (stalls, • Huge quantitative base of implementation costs and performance exceptions, branch prediction) hardware • Convergence trend – enable optimizations, support HLL, OS David Culler support, contain complexity • Properties of a good abstraction Electrical Engineering and Computer Sciences • Lots of marketing (ignores, – Lasts through many generations (portability) misuses, or selective use of University of California, Berkeley – Used in many different ways (generality) established data) – Provides convenient functionality to higher levels • Worse when we get beyond http://www.eecs.berkeley.edu/~culler – Permits an efficient implementation at lower scalar operations levels http://www-inst.eecs.berkeley.edu/~cs252 • Translation / Interpretation boundary becoming less sharp 1/27/2005 CS252S05 L4 Pipe Issues 2 Discussion Exercise Ordering Properties of basic inst. pipeline Time (clock cycles) • In terms of the ‘iron triangle” what are the Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 performance implications of condition-codes? I ALU Ifetch Reg DMem Reg Execution window n s t ALU Reg Ifetch Reg DMem r. ALU Ifetch Reg DMem Reg O r ALU Reg Ifetch Reg DMem d e r Issue Complete • Instructions issued in order • Operand fetch is stage 2 => operand fetched in order • Write back in stage 5 => no WAW, no WAR hazards • Common pipeline flow => operands complete in order • Stage changes only at “end of instruction” 1/27/2005 CS252S05 L4 Pipe Issues 3 1/27/2005 CS252S05 L4 Pipe Issues 4 Control Pipeline What does forwarding do? Time (clock cycles) nPC I ALU n add r1,r2,r3 Ifetch Reg DMem Reg mux Op A s Registers A-res t ALU sub r4,r1,r3 Ifetch Reg DMem Reg ALU MEM-res r. D-Mem Op B ALU Ifetch Reg DMem Reg mux O and r6,r1,r7 B-byp r d ALU Ifetch Reg DMem Reg brch or r8,r1,r9 e r Ifetch Reg ALU DMem Reg xor r10,r1,r4 IR mop wop fwd ctrl op aop Next PC wop I-fetch mop Rr PC wop dcd Rr • Destination register is a name for instr’s result + Ra kill kill Rb ? • Source registers are names for sources Rr ? imed • Forwarding logic builds data dependence | & - +4 (flow) graph for instructions in the execution PC3 PC1 nPC PC2 window x CS252S05 L4 Pipe Issues 5 CS252S05 L4 Pipe Issues 6 1/27/2005 1/27/2005 NOW Handout Page 1

  2. Multicycle stages Historical Perspective: Microprogramming “macro-instructions” Datapath Stage User program Main plus Data ADD Memory SUB AND Nxt Pipeline Contr Reg . Pipeline Control Reg . . one of these is DATA mapped into a execution sequence of these unit CPU control memory Micro-sequencer control Stall Datapath control Writable-control store? Supported complex instructions a sequence of simple micro-inst (RTs) • Stage microsequencer spits micro-ops into the pipe Pipelined micro-instruction processing, but very limited view. Could not reorganize macroinstructions to enable pipelining 1/27/2005 CS252S05 L4 Pipe Issues 7 1/27/2005 CS252S05 L4 Pipe Issues 8 Branch prediction Typical “simple” Pipeline • Example: MIPS R4000 • Datapath parallelism only useful if you can keep it fed. integer unit • Easy to fetch multiple (consecutive) instructions ex per cycle FP/int Multiply – essentially speculating on sequential flow IF ID MEM WB • Jump: unconditional change of control flow m1 m2 m3 m4 m5 m6 m7 – Always taken FP adder • Branch: conditional change of control flow a1 a2 a3 a4 – Taken about 50% of the time FP/int divider – Backward: 30% x 80% taken Div (lat = 25, – Forward: 70% x 40% taken Init inv=25) 1/27/2005 CS252S05 L4 Pipe Issues 9 1/27/2005 CS252S05 L4 Pipe Issues 10 Case for Branch Prediction when A Big Idea for Today Issue N instructions per clock cycle • Reactive: past actions cause system to adapt use 1. Branches will arrive up to n times faster in an n - issue processor – do what you did before better – ex: caches 2. Amdahl’s Law => relative impact of the control – TCP windows stalls will be larger with the lower potential CPI – URL completion, ... in an n -issue processor • Proactive: uses past actions to predict future actions – optimize speculatively, anticipate what you are about to do – branch prediction conversely, need branch prediction to ‘see’ – long cache blocks potential parallelism – ??? CS252S05 L4 Pipe Issues 11 CS252S05 L4 Pipe Issues 12 1/27/2005 1/27/2005 NOW Handout Page 2

  3. Branch Prediction Schemes Dynamic Branch Prediction • Performance = ƒ(accuracy, cost of misprediction) 0. Static Branch Prediction • Branch History Table: Lower bits of PC address index table of • 1-bit Branch-Prediction Buffer 1-bit values – Says whether or not branch taken last time • 2-bit Branch-Prediction Buffer – No address check » saves HW, but may not be right branch • Correlating Branch Prediction Buffer – If inst == BR, update table with outcome • Problem: in a loop, 1-bit BHT will cause 2 mispredictions • Tournament Branch Predictor – End of loop case, when it exits instead of looping as before • Branch Target Buffer – First time through loop on next time through code, when it predicts exit instead of looping • Integrated Instruction Fetch Units – avg is 9 iterations before exit – Only 80% accuracy even if loop 90% of the time • Return Address Predictors • Local history – This particular branch inst » Or one that maps into same lost PC 1/27/2005 CS252S05 L4 Pipe Issues 13 1/27/2005 CS252S05 L4 Pipe Issues 14 2-bit Dynamic Branch Prediction Consider 3 Scenarios (J. Smith, 1981) • 2-bit scheme where change prediction only if get • Branch for loop test misprediction twice: • Check for error or exception predictors T • Alternating taken / not-taken Global history NT – example? Predict Taken Predict Taken taken T T NT NT Predict Not • Your worst-case prediction scenario Predict Not T Taken Taken • Red: stop, not taken NT • Green: go, taken • How could HW predict “this loop will execute 3 • Adds hysteresis to decision making process times” using a simple mechanism? • Generalize to n-bit saturating counter 1/27/2005 CS252S05 L4 Pipe Issues 15 1/27/2005 CS252S05 L4 Pipe Issues 16 Correlating Branches Accuracy of Different Schemes (Figure 3.15, p. 206) 20% Idea: taken/not taken Branch address (4 bits) 18% of recently executed 4096 Entries 2-bit BHT 18% Frequency of Mispredictions branches is related to 16% Unlimited Entries 2-bit BHT 2-bits per branch behavior of next 1024 Entries (2,2) BHT 14% local predictors Frequency of Mispredictions branch (as well as the 12% history of that branch 11% behavior) 10% – Then behavior of recent Prediction Prediction 8% branches selects 6% 6% 6% between, say, 4 6% 5% 5% predictions of next 4% 4% branch, updating just that prediction 0% 2% 1% 1% • (2,2) predictor: 2-bit 0% 0% global, 2-bit local 2-bit recent global nasa7 matrix300 tomcatv doducd spice fpppp gcc espresso eqntott li branch history 4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2) (01 = not taken then taken) CS252S05 L4 Pipe Issues 17 CS252S05 L4 Pipe Issues 18 1/27/2005 1/27/2005 What’s missing in this picture? NOW Handout Page 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend