cs104 computer organization and design
play

CS104 Computer Organization and Design Datapaths CS104 (Hilton): - PowerPoint PPT Presentation

CS104 Computer Organization and Design Datapaths CS104 (Hilton): Datapaths [Slides adapted from A. Roths] 1 Admin Homework Homework 4 out tonight Due Monday March 26 th Download/check your submissions Reading: Chapter


  1. Micro-architectural factors • Micro-architecture: • The details of how the ISA is implemented • Affects CPI and Clock frequency • Often will look at fixed program, and consider MIPS • Million Instructions Per Second • MIPS = IPC * Frequency (in MHz) • IPC = Instruction Per Cycle (1 / CPI) • Gives “Bigger is better” number Instructions Cycles Instructions ————— x ————— = —————— Cycle Second Second (IPC) (Frequency) (Throughput) CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 30

  2. “Best” IPC • For now, best we can do: IPC = 1 (CPI = 1) • Do 1 instruction every cycle • Later: • Real processors can do multiple instructions at once! • Potentially: IPC < 1! • Best possible IPC depends on design CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 31

  3. Performance vs …. • 1990s: Performance at all cost • Actually more “clock frequency” at all cost… • Now: Care about other things • Energy (electric bill, battery life) • Power (cooling, also affects energy) • Area (chip cost) • Reliability (tolerance of transient faults: e.g., charge particle strikes) • … • Important metric these days “Performance / Watt” • Throughput divided by power consumption • Why? CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 32

  4. Performance Modeling and Analysis • Speaking of performance • Making a processor takes time (years) and money (millions) • Want to know it will perform well before you finish • If its wrong, doing it all over is painful… • Performance can be simulated in software • Estimate what IPC will be • Guide design • This is my other job by the way… CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 33

  5. Single-Cycle Datapath Performance + 4 a P Insn Register Data C Mem File Mem d s1 s2 d S X Control ROM/random logic • Goes against make common case fast (MCCF) principle + Low Cycles Per Instruction ( CPI ): 1 – Long clock period: to accommodate slowest insn CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 34

  6. Alternative: Multi-Cycle Datapath s3 << + 2 4 A P Insn I Register O D a C Mem R File Data B s5 s1 s2 d Mem s3 d s5 s5 S X s4 s3 • Multi-cycle datapath : attacks high clock period • Cut datapath into multiple stages (5 here), isolate using FFs • FSM control “walks” insns thru stages (by staging control signals) + Insns can bypass stages and exit early CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 35

  7. Finite State Machine (FSM) • FSM = States + Transitions • Next state: function of current state + inputs • Outputs: function of current state + inputs • Canonical Example: Combination Lock • Must enter 3 8 4 to unlock • P.S. Useful in software too CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 36

  8. Finite State Machines: Example Start • Combination Lock Example: • Need to enter 3 8 4 to unlock • Initial State: no valid piece of combo seen CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 37

  9. Finite State Machines: Example 3 Start 1 0-2,4-9 • Combination Lock Example: • Need to enter 3 8 4 to unlock • Input of 3: transition to new state • Any other input: stay in same state CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 38

  10. Finite State Machines: Example 3 3 Start 1 8 0-2,4-7,9 2 0-2,4-9 • Combination Lock Example: • Need to enter 3 8 4 to unlock • State 1: • Input = 8? Goto state 2 CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 39

  11. Finite State Machines: Example 3 3 Start 1 8 0-2,4-7,9 4 3 2 3 0-2,4-9 0-2,5-9 • Combination Lock Example: • Need to enter 3 8 4 to unlock • State 2: • Input = 4? Goto state 3 CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 40

  12. Finite State Machines: Example 3 3 Start 1 8 0-2,4-7,9 4 3 2 3 0-2,4-9 0-2,5-9 • Combination Lock Example: • Need to enter 3 8 4 to unlock • State 3: Unlock! CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 41

  13. FSM in Hardware • Flip flop (s) to hold state (s) • Combinatorial logic to determine next state/output • (Assumes FF enable on input_valid) CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 42

  14. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 43

  15. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 44

  16. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 45

  17. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 46

  18. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 47

  19. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 48

  20. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 49

  21. FSM Implementation: ROM K-bit input Inputs N-bit state M-bit output 2 (N+K) Entry ROM K N + K N M N Register Outputs • Just saw: FSM implemented with sum-of-products • Remind us what that is? • Can also be implemented with a ROM CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 50

  22. FSM ROM Implementation Example • Combination Lock (3 8 4) Example • 4-bit input • 2-bit state • 64-entry ROM (indexed with S 1 S 0 I 3 I 2 I 1 I 0 ) • Each entry needs 3 bits (S 1 S 0 U) • 2 for next state • 1 for unlock signal • Example entries in ROM • 0x00 = 000 • 0x03 = 010 • 0x18 = 100 • 0x13 = 010 • 0x3_ = 001 CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 51

  23. Multi-cycle Datapath FSM Decode Insn Next Insn • First state: Get a New Instruction • Output signals to fetch (e.g., read enable IMEM) • Next State: Always Decode CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 52

  24. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP • Second State: Decode • Output signals to decode instruction (RdEn RegFile) • Go to Next Insn if NOP • Otherwise Execute CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 53

  25. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch • Execute State • Execute Insn (varies by insn type) • Next State: Also depends on insn type • Branches: Next Insn CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 54

  26. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch ALU Writeback • Execute State • Execute Insn (varies by insn type) • Next State: Also depends on insn type • ALU op: write register CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 55

  27. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch Load ALU Read Writeback DMEM • Execute State • Execute Insn (varies by insn type) • Next State: Also depends on insn type • Load: Read Memory CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 56

  28. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch Store Load Write ALU DMEM Read Writeback DMEM • Execute State • Execute Insn (varies by insn type) • Next State: Also depends on insn type • Store: Write Memory CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 57

  29. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch Store Load Write ALU DMEM Read Writeback DMEM • Read DMEM State • Control signals enable DMEM Read • Next state is writeback CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 58

  30. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch Store Load Write ALU DMEM Read Writeback DMEM • Writeback state • Control signals enable regfile write • Next state: Next Insn CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 59

  31. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch Store Load Write ALU DMEM Read Writeback DMEM • Write DMEM state • Control signals enable memory write • Next state: Next Insn CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 60

  32. Multi-Cycle Datapath Example: Add << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X • Example: Add • Cycle 1: Read IMEM CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 61

  33. Multi-Cycle Datapath Example: Add << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X • Example: Add • Cycle 1: Read IMEM • Cycle 2: Decode + Read RF CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 62

  34. Multi-Cycle Datapath Example: Add << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X • Example: Add • Cycle 1: Read IMEM • Cycle 2: Decode + Read RF • Cycle 3: ALU CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 63

  35. Multi-Cycle Datapath Example: Add << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X • Example: Add • Cycle 1: Read IMEM • Cycle 2: Decode + Read RF • Cycle 3: ALU • Cycle 4: Writeback + Increment PC CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 64

  36. Multi-Cycle Datapath Performance << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X • Opposite performance split of single-cycle datapath + Short clock period – High CPI CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 65

  37. Multi-cycle Data-path CPI • CPI depends on instructions • Branches / Jumps: 3 cycles • ALU: 4 cycles • Stores: 4 cycles • Loads: 5 cycles • Overall CPI is weighted average • Example: • 20% loads, 15% stores, 20% branches, 45% ALU CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 66

  38. Multi-cycle Data-path CPI • CPI depends on instructions • Branches / Jumps: 3 cycles • ALU: 4 cycles • Stores: 4 cycles • Loads: 5 cycles • Overall CPI is weighted average • Example: • 20% loads , 15% stores, 20% branches, 45% ALU CPI= 0.20 * 5 + CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 67

  39. Multi-cycle Data-path CPI • CPI depends on instructions • Branches / Jumps: 3 cycles • ALU: 4 cycles • Stores: 4 cycles • Loads: 5 cycles • Overall CPI is weighted average • Example: • 20% loads, 15% stores , 20% branches, 45% ALU CPI= 0.20 * 5 + 0.15 * 4 + CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 68

  40. Multi-cycle Data-path CPI • CPI depends on instructions • Branches / Jumps: 3 cycles • ALU: 4 cycles • Stores: 4 cycles • Loads: 5 cycles • Overall CPI is weighted average • Example: • 20% loads, 15% stores, 20% branches, 45% ALU CPI= 0.20 * 5 + 0.15 * 4 + 0.20 * 3 + 0.45 * 4 = 4.0 CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 69

  41. Multi-cycle Datapath Performance • Single-cycle • Clock period = 50ns, CPI = 1 • Performace = 50 ns/insn • Multi-cycle • Clock period = 10ns • CPI = (0.2*3+0.2*5+0.6*4) = 4 • Performance = 40 ns/insn • But wait… CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 70

  42. Multi-Cycle Datapath Performance << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X • Did not just cut up existing logic into 5 pieces • Also added logic (flip flops) • So clock period not 1/5 of single cycle, but slightly longer CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 71

  43. Multi-cycle Datapath Performance • Single-cycle • Clock period = 50ns, CPI = 1 • Performace = 50 ns/insn • Multi-cycle • Clock period = 12ns • CPI = (0.2*3+0.2*5+0.6*4) = 4 • Performance = 48 ns/insn • Better, but not as exciting… • Can we do better still? • Have our cake (low CPI) and eat it too (high clock frequency)? CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 72

  44. Clock Period and CPI • Single-cycle datapath + Low CPI: 1 – Long clock period: to accommodate slowest insn insn0.fetch, dec, exec insn1.fetch, dec, exec • Multi-cycle datapath + Short clock period – High CPI insn0.fetch insn0.dec insn0.exec insn1.fetch insn1.dec insn1.exec • Can we have both low CPI and short clock period? – No good way to make a single insn go faster + Insn latency doesn’t matter anyway … insn throughput matters • Key: exploit inter-insn parallelism CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 73

  45. Pipelining • Pipelining : important performance technique • Improves insn throughput rather than insn latency • Exploits parallelism at insn-stage level to do so • Begin with multi-cycle design insn0.fetch insn0.dec insn0.exec insn1.fetch insn1.dec insn1.exec • When insn advances from stage 1 to 2, next insn enters stage 1 insn0.fetch insn0.dec insn0.exec insn1.fetch insn1.dec insn1.exec • Individual insns take same number of stages + But insns enter and leave at a much faster rate • Physically breaks “atomic” VN loop ... but must maintain illusion • Automotive assembly line analogy CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 74

  46. 5 Stage Multi-Cycle Datapath << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 75

  47. 5 Stage Pipelined Datapath PC PC << + 2 4 A O Insn Register PC a Mem File O D Data B s1 s2 d Mem d B S X IR IR IR IR • Temporary values (PC,IR,A,B,O,D) re-latched every stage • Why? 5 insns may be in pipeline at once, they share a single PC? • Notice, PC not latched after ALU stage (why not?) CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 76

  48. Pipeline Terminology PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR • Stages: F etch, D ecode, e X ecute, M emory, W riteback • Latches (pipeline registers): PC , F/D , D/X , X/M , M/W CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 77

  49. Some More Terminology • Scalar pipeline : one insn per stage per cycle • Alternative: “superscalar” (next unit) • In-order pipeline : insns enter execute stage in VN order • Alternative: “out-of-order” (not covered in CSE 371) • Pipeline depth : number of pipeline stages • Nothing magical about five • Trend has been to deeper pipelines CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 78

  50. Pipeline Example: Cycle 1 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR add $3,$2,$1 • 3 instructions CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 79

  51. Pipeline Example: Cycle 2 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR lw $4,0($5) add $3,$2,$1 CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 80

  52. Pipeline Example: Cycle 3 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw $6,4($7) lw $4,0($5) add $3,$2,$1 CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 81

  53. Pipeline Example: Cycle 4 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw $6,4($7) lw $4,0($5) add $3,$2,$1 • 3 instructions CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 82

  54. Pipeline Example: Cycle 5 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw $6,4($7) lw $4,0($5) add CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 83

  55. Pipeline Example: Cycle 6 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw $6,4(7) lw CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 84

  56. Pipeline Example: Cycle 7 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 85

  57. Pipeline Diagram • Pipeline diagram : shorthand for what we just saw • Across: cycles • Down: insns • Convention: X means lw $4,0($5) finishes execute stage and writes into X/M latch at end of cycle 4 1 2 3 4 5 6 7 8 9 F D X M W add $3,$2,$1 F D X M W lw $4,0($5) F D X M W sw $6,4($7) CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 86

  58. What About Pipelined Control? • Should it be like single-cycle control? • But individual insn signals must be staged • Should it be like multi-cycle control? • But all stages are simultaneously active • How many different controllers are we going to need? • One for each insn in pipeline? • Solution: use simple single-cycle control, but pipeline it • Single controller CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 87

  59. Pipelined Control PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR xC mC wC CTRL mC wC wC CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 88

  60. Pipeline Performance Calculation • Single-cycle • Clock period = 50ns, CPI = 1 • Performace = 50ns/insn • Multi-cycle • Branch: 20% (3 cycles), load: 20% (5 cycles), other: 60% (4 cycles) • Clock period = 12ns , CPI = (0.2*3+0.2*5+0.6*4) = 4 • Remember: latching overhead makes it 12, not 10 • Performance = 48ns/insn • Pipelined • Clock period = 12ns • CPI = 1.5 (on average insn completes every 1.5 cycles) • Performance = 18ns/insn CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 89

  61. Q1: Why Is Pipeline Clock Period … • … > delay thru datapath / number of pipeline stages? • Latches (FFs) add delay • Pipeline stages have different delays, clock period is max delay • Both factors have implications for ideal number pipeline stages CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 90

  62. Q2: Why Is Pipeline CPI… • … > 1? • CPI for scalar in-order pipeline is 1 + stall penalties • Stalls used to resolve hazards • Hazard : condition that jeopardizes VN illusion • Stall : artificial pipeline delay introduced to restore VN illusion • Calculating pipeline CPI • Frequency of stall * stall cycles • Penalties add (stalls generally don’t overlap in in-order pipelines) • 1 + stall-freq 1 *stall-cyc 1 + stall-freq 2 *stall-cyc 2 + … • Correctness/performance/MCCF • Long penalties OK if they happen rarely, e.g., 1 + 0.01 * 10 = 1.1 • Stalls also have implications for ideal number of pipeline stages CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 91

  63. Dependences and Hazards • Dependence : relationship between two insns • Data : two insns use same storage location • Control : one insn affects whether another executes at all • Not a bad thing, programs would be boring without them • Enforced by making older insn go before younger one • Happens naturally in single-/multi-cycle designs • But not in a pipeline • Hazard : dependence & possibility of wrong insn order • Effects of wrong insn order cannot be externally visible • Stall : for order by keeping younger insn in same stage • Hazards are a bad thing: stalls reduce performance CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 92

  64. Why Does Every Insn Take 5 Cycles? PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR add $3,$2,$1 lw $4,0($5) • Could /should we allow add to skip M and go to W? No – It wouldn’t help: peak fetch still only 1 insn per cycle – Structural hazards : imagine add follows lw CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 93

  65. Structural Hazards • Structural hazards • Two insns trying to use same circuit at same time • E.g., structural hazard on regfile write port • To fix structural hazards : proper ISA/pipeline design • Each insn uses every structure exactly once • For at most one cycle • Always at same stage relative to F CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 94

  66. Data Hazards A O Register O D a File Data B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw $6,0($7) lw $4,0($5) add $3,$2,$1 • Let’s forget about branches and the control for a while • The three insn sequence we saw earlier executed fine… • But it wasn’t a real program • Real programs have data dependences • They pass values via registers and memory CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 95

  67. Data Hazards A O Register O D a File Data B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw $3,0($7) addi $6,1,$3 lw $4,0($3) add $3,$2,$1 • Would this “program” execute correctly on this pipeline? • Which insns would execute with correct inputs? • add is writing its result into $3 in current cycle – lw read $3 2 cycles ago → got wrong value – addi read $3 1 cycle ago → got wrong value • sw is reading $3 this cycle → OK (regfile timing: write first half) CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 96

  68. Memory Data Hazards A O Register O D a File Data B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR lw $4,0($1) sw $5,0($1) • What about data hazards through memory? No • lw following sw to same address in next cycle, gets right value • Why? DMem read/write take place in same stage • Data hazards through registers? Yes (previous slide) • Occur because register write is 3 stages after register read • Can only read a register value 3 cycles after writing it CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 97

  69. Fixing Register Data Hazards • Can only read register value 3 cycles after writing it • One way to enforce this: make sure programs don’t do it • Compiler puts two independent insns between write/read insn pair • If they aren’t there already • Independent means: “do not interfere with register in question” • Do not write it: otherwise meaning of program changes • Do not read it: otherwise create new data hazard • Code scheduling : compiler moves around existing insns to do this • If none can be found, must use nops • This is called software interlocks • MIPS : M icroprocessor w/out I nterlocking P ipeline S tages CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 98

  70. Software Interlock Example add $3,$2,$1 lw $4,0($3) sw $7,0($3) add $6,$2,$8 addi $3,$5,4 • Can any of last three insns be scheduled between first two • sw $7,0($3) ? No, creates hazard with add $3,$2,$1 • add $6,$2,$8 ? OK • addi $3,$5,4? No, lw would read $3 from it • Still need one more insn, use nop add $3,$2,$1 add $6,$2,$8 nop lw $4,0($3) sw $7,0($3) addi $3,$5,4 CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 99

  71. Software Interlock Performance • Same deal • Branch: 20%, load: 20%, store: 10%, other: 50% • Software interlocks • 20% of insns require insertion of 1 nop • 5% of insns require insertion of 2 nops • CPI is still 1 technically • But now there are more insns • #insns = 1 + 0.20*1 + 0.05*2 = 1.3 – 30% more insns (30% slowdown) due to data hazards CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend