chapter 7
play

Chapter 7 Digital Design and Computer Architecture , 2 nd Edition - PowerPoint PPT Presentation

Chapter 7 Digital Design and Computer Architecture , 2 nd Edition David Money Harris and Sarah L. Harris Chapter 7 <1> Chapter 7 :: Topics Introduction Performance Analysis Single-Cycle Processor Multicycle Processor


  1. Extended Functionality: j Jump MemtoReg Control MemWrite Unit Branch PCSrc ALUControl 2:0 31:26 Op ALUSrc 5:0 Funct RegDst RegWrite CLK CLK CLK 0 WE3 Zero WE SrcA PC' 25:21 0 A1 RD1 PC Instr 0 Result 1 A RD ALU ALUResult ReadData 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcB Data Memory A3 1 Memory WriteData Register WD3 WD File 20:16 0 PCJump 15:11 1 WriteReg 4:0 PCPlus4 + SignImm <<2 4 15:0 Sign Extend PCBranch + 27:0 31:28 25:0 <<2 Chapter 7 <30>

  2. Control Unit: Main Decoder Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 Jump 1 1 0 0 0 0 10 0 R-type 000000 1 0 1 0 0 1 00 0 lw 100011 0 X 1 0 1 X 00 0 101011 sw 0 X 0 1 0 X 01 0 000100 beq j 000010 Chapter 7 <31>

  3. Control Unit: Main Decoder Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 Jump 1 1 0 0 0 0 10 0 R-type 000000 1 0 1 0 0 1 00 0 lw 100011 0 X 1 0 1 X 00 0 101011 sw 0 X 0 1 0 X 01 0 000100 beq 0 X X X 0 X XX 1 j 000010 Chapter 7 <32>

  4. Review: Processor Performance Program Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = # instructions x CPI x T C Chapter 7 <33>

  5. Single-Cycle Performance MemtoReg Control MemWrite Unit 0 Branch 0 PCSrc ALUControl 2:0 31:26 Op ALUSrc 5:0 Funct RegDst RegWrite CLK CLK 1 0 CLK 010 1 WE3 Zero WE SrcA 25:21 0 A1 RD1 PC' PC Instr 0 A RD ALU ALUResult ReadData 1 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcB Data Memory A3 1 Memory WriteData Register WD3 WD File 0 20:16 0 15:11 1 WriteReg 4:0 PCPlus4 + SignImm <<2 4 15:0 Sign Extend PCBranch + Result T C limited by critical path ( lw ) Chapter 7 <34>

  6. Single-Cycle Performance • Single-cycle critical path: T c = t pcq_PC + t mem + max( t RF read , t sext + t mux ) + t ALU + t mem + t mux + t RF setup • Typically, limiting paths are: – memory, ALU, register file – T c = t pcq_PC + 2 t mem + t RF read + t mux + t ALU + t RF setup Chapter 7 <35>

  7. Single-Cycle Performance Example Element Parameter Delay (ps) Register clock-to-Q t pcq _PC 30 Register setup t setup 20 Multiplexer t mux 25 ALU t ALU 200 Memory read t mem 250 Register file read t RF read 150 Register file setup t RF setup 20 T c = ? Chapter 7 <36>

  8. Single-Cycle Performance Example Element Parameter Delay (ps) Register clock-to-Q t pcq _PC 30 Register setup t setup 20 Multiplexer t mux 25 ALU t ALU 200 Memory read t mem 250 Register file read t RF read 150 Register file setup t RF setup 20 T c = t pcq_PC + 2 t mem + t RF read + t mux + t ALU + t RF setup = [30 + 2(250) + 150 + 25 + 200 + 20] ps = 925 ps Chapter 7 <37>

  9. Single-Cycle Performance Example Program with 100 billion instructions: Execution Time = # instructions x CPI x T C = (100 × 10 9 )(1)(925 × 10 -12 s) = 92.5 seconds Chapter 7 <38>

  10. Multicycle MIPS Processor • Single-cycle: + simple - cycle time limited by longest instruction ( lw ) - 2 adders/ALUs & 2 memories • Multicycle: + higher clock speed + simpler instructions run faster + reuse expensive hardware on multiple cycles - sequencing overhead paid many times • Same design steps: datapath & control Chapter 7 <39>

  11. Multicycle State Elements • Replace Instruction and Data memories with a single unified memory – more realistic CLK CLK CLK WE WE3 A1 RD1 PC' PC RD A A2 RD2 EN Instr / Data Memory A3 Register WD File WD3 Chapter 7 <40>

  12. Multicycle Datapath: Instruction Fetch STEP 1: Fetch instruction IRWrite CLK CLK CLK CLK WE WE3 A1 RD1 PC' PC Instr b RD A A2 RD2 EN Instr / Data Memory A3 Register WD File WD3 Chapter 7 <41>

  13. Multicycle Datapath: lw Register Read STEP 2a: Read source operands from RF IRWrite CLK CLK CLK CLK CLK WE WE3 A 25:21 A1 RD1 PC' PC Instr b RD A A2 RD2 EN Instr / Data Memory A3 Register WD File WD3 Chapter 7 <42>

  14. Multicycle Datapath: lw Immediate STEP 2b: Sign-extend the immediate IRWrite CLK CLK CLK CLK CLK WE WE3 A 25:21 A1 RD1 PC' PC Instr b RD A A2 RD2 EN Instr / Data Memory A3 Register WD File WD3 SignImm 15:0 Sign Extend Chapter 7 <43>

  15. Multicycle Datapath: lw Address STEP 3: Compute the memory address IRWrite ALUControl 2:0 CLK CLK CLK CLK CLK CLK WE WE3 SrcA A 25:21 A1 RD1 PC' PC Instr b RD ALU ALUResult ALUOut A A2 RD2 EN SrcB Instr / Data Memory A3 Register WD File WD3 SignImm 15:0 Sign Extend Chapter 7 <44>

  16. Multicycle Datapath: lw Memory Read STEP 4: Read data from memory IorD IRWrite ALUControl 2:0 CLK CLK CLK CLK CLK CLK WE WE3 SrcA A 25:21 A1 RD1 PC' PC Instr b Adr 0 RD ALU ALUResult ALUOut A A2 RD2 EN 1 SrcB Instr / Data Memory A3 CLK Register WD File Data WD3 SignImm 15:0 Sign Extend Chapter 7 <45>

  17. Multicycle Datapath: lw Write Register STEP 5: Write data back to register file IorD IRWrite RegWrite ALUControl 2:0 CLK CLK CLK CLK CLK CLK WE WE3 SrcA A 25:21 A1 RD1 PC' PC Instr b 0 RD Adr ALU ALUResult ALUOut A A2 RD2 EN 1 SrcB Instr / Data Memory 20:16 A3 CLK Register WD File Data WD3 SignImm 15:0 Sign Extend Chapter 7 <46>

  18. Multicycle Datapath: Increment PC STEP 6: Increment PC PCWrite IorD IRWrite RegWrite ALUSrcA ALUSrcB 1:0 ALUControl 2:0 CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 A 25:21 A1 RD1 1 PC' PC Instr b 0 RD Adr ALU ALUResult ALUOut A A2 RD2 00 EN EN 1 SrcB 4 01 Instr / Data Memory 20:16 A3 10 CLK Register WD 11 File Data WD3 SignImm 15:0 Sign Extend Chapter 7 <47>

  19. Multicycle Datapath: sw Write data in rt to memory PCWrite IorD MemWrite IRWrite RegWrite ALUSrcA ALUSrcB 1:0 ALUControl 2:0 CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 A 25:21 A1 RD1 1 PC' PC Instr b 0 RD Adr B ALU ALUResult ALUOut 20:16 A A2 RD2 00 EN EN 1 4 01 Instr / Data SrcB Memory 20:16 A3 10 CLK Register WD 11 File Data WD3 SignImm 15:0 Sign Extend Chapter 7 <48>

  20. Multicycle Datapath: R-Type • Read from rs and rt • Write ALUResult to register file • Write to rd (instead of rt ) PCWrite IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB 1:0 ALUControl 2:0 CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 A 25:21 A1 RD1 1 PC' PC Instr b 0 RD Adr B ALU ALUResult ALUOut 20:16 A A2 RD2 00 EN EN 1 4 01 Instr / Data 20:16 SrcB 0 Memory A3 10 15:11 1 CLK Register WD 11 File 0 Data WD3 1 SignImm 15:0 Sign Extend Chapter 7 <49>

  21. Multicycle Datapath: beq • rs == rt ? • BTA = (sign-extended immediate << 2) + (PC+4) PCEn IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB 1:0 ALUControl 2:0 Branch PCWrite PCSrc CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 Zero A 25:21 A1 RD1 1 PC' PC Instr 0 b 0 RD Adr B ALU 20:16 ALUResult ALUOut A A2 RD2 00 EN EN 1 1 4 01 Instr / Data SrcB 20:16 0 Memory A3 10 15:11 1 CLK Register WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <50>

  22. Multicycle Processor CLK PCWrite PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 Zero A 25:21 A1 RD1 1 PC' PC Instr 0 0 RD Adr B ALU 20:16 ALUResult ALUOut A A2 RD2 00 EN EN 1 1 4 01 Instr / Data SrcB 20:16 0 Memory A3 10 15:11 1 CLK Register WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <51>

  23. Multicycle Control Control MemtoReg Unit RegDst IorD Multiplexer Selects PCSrc Main ALUSrcB 1:0 Controller Opcode 5:0 ALUSrcA (FSM) IRWrite MemWrite Register PCWrite Enables Branch RegWrite ALUOp 1:0 ALU Funct 5:0 ALUControl 2:0 Decoder Chapter 7 <52>

  24. Main Controller FSM: Fetch S0: Fetch Reset CLK 1 PCWrite 0 PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst 0 CLK CLK CLK 0 0 CLK CLK 0 SrcA 010 CLK 0 WE WE3 Zero 0 A 25:21 A1 RD1 1 PC' PC Instr 0 01 0 RD Adr B ALU ALUResult ALUOut A 20:16 A2 RD2 00 EN EN 1 1 X 4 01 Instr / Data SrcB 1 1 20:16 0 Memory A3 10 15:11 1 CLK Register X WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <53>

  25. Main Controller FSM: Fetch S0: Fetch IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite CLK 1 PCWrite PCWrite 0 PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst 0 CLK CLK CLK 0 0 CLK CLK 0 SrcA 010 CLK 0 WE WE3 Zero 0 A 25:21 A1 RD1 1 PC' PC Instr 0 01 0 RD Adr B ALU ALUResult ALUOut A 20:16 A2 RD2 00 EN EN 1 1 X 4 01 Instr / Data SrcB 1 1 20:16 0 Memory A3 10 15:11 1 CLK Register X WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <54>

  26. Main Controller FSM: Decode S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite CLK 0 PCWrite 0 PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst X CLK CLK CLK 0 0 CLK CLK 0 SrcA XXX CLK X WE WE3 Zero X A 25:21 A1 RD1 1 PC' PC Instr 0 XX 0 RD Adr B ALU 20:16 ALUResult ALUOut A A2 RD2 00 EN EN 1 1 X 4 01 Instr / Data SrcB 0 20:16 0 0 Memory A3 10 15:11 1 CLK Register X WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <55>

  27. Main Controller FSM: Address S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Op = LW or S2: MemAdr Op = SW CLK 0 PCWrite 0 PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst 1 CLK CLK CLK 0 0 CLK CLK 0 SrcA 010 CLK X WE WE3 Zero X A 25:21 A1 RD1 1 PC' PC Instr 0 10 0 RD Adr B ALU ALUResult ALUOut 20:16 A A2 RD2 00 EN EN 1 1 X 4 01 Instr / Data SrcB 0 20:16 0 0 Memory A3 10 15:11 1 CLK Register X WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <56>

  28. Main Controller FSM: Address S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Op = LW or CLK S2: MemAdr Op = SW 0 PCWrite 0 PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 ALUSrcA = 1 MemWrite ALUSrcB 1:0 ALUSrcB = 10 IRWrite ALUOp = 00 ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst 1 CLK CLK CLK 0 0 CLK CLK 0 SrcA 010 CLK X WE WE3 Zero X A 25:21 A1 RD1 1 PC' PC Instr 0 10 0 RD Adr B ALU ALUResult ALUOut 20:16 A A2 RD2 00 EN EN 1 1 X 4 01 Instr / Data SrcB 0 20:16 0 0 Memory A3 10 15:11 1 CLK Register X WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <57>

  29. Main Controller FSM: lw S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Op = LW or S2: MemAdr Op = SW ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 Op = LW S3: MemRead IorD = 1 S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <58>

  30. Main Controller FSM: sw S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Op = LW or S2: MemAdr Op = SW ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 Op = SW Op = LW S5: MemWrite S3: MemRead IorD = 1 IorD = 1 MemWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <59>

  31. Main Controller FSM: R-Type S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 00 ALUOp = 10 Op = SW Op = LW S7: ALU S5: MemWrite Writeback S3: MemRead RegDst = 1 IorD = 1 IorD = 1 MemtoReg = 0 MemWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <60>

  32. Main Controller FSM: beq S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUSrcA = 0 ALUOp = 00 ALUSrcB = 11 PCSrc = 0 ALUOp = 00 IRWrite PCWrite Op = BEQ Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute S8: Branch ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUOp = 00 ALUOp = 10 PCSrc = 1 Branch Op = SW Op = LW S7: ALU S5: MemWrite Writeback S3: MemRead RegDst = 1 IorD = 1 IorD = 1 MemtoReg = 0 MemWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <61>

  33. Multicycle Controller FSM S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUSrcA = 0 ALUOp = 00 ALUSrcB = 11 PCSrc = 0 ALUOp = 00 IRWrite PCWrite Op = BEQ Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute S8: Branch ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUOp = 00 ALUOp = 10 PCSrc = 1 Branch Op = SW Op = LW S7: ALU S5: MemWrite Writeback S3: MemRead RegDst = 1 IorD = 1 IorD = 1 MemtoReg = 0 MemWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <62>

  34. Extended Functionality: addi S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUSrcA = 0 ALUOp = 00 ALUSrcB = 11 PCSrc = 0 ALUOp = 00 IRWrite PCWrite Op = ADDI Op = BEQ Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute S9: ADDI S8: Branch Execute ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUOp = 00 ALUOp = 10 PCSrc = 1 Branch Op = SW Op = LW S7: ALU S5: MemWrite S10: ADDI Writeback S3: MemRead Writeback RegDst = 1 IorD = 1 IorD = 1 MemtoReg = 0 MemWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <63>

  35. Main Controller FSM: addi S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUSrcA = 0 ALUOp = 00 ALUSrcB = 11 PCSrc = 0 ALUOp = 00 IRWrite PCWrite Op = ADDI Op = BEQ Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute S9: ADDI S8: Branch Execute ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10 ALUOp = 00 ALUOp = 10 PCSrc = 1 ALUOp = 00 Branch Op = SW Op = LW S7: ALU S5: MemWrite S10: ADDI Writeback S3: MemRead Writeback RegDst = 1 RegDst = 0 IorD = 1 IorD = 1 MemtoReg = 0 MemtoReg = 0 MemWrite RegWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <64>

  36. Extended Functionality: j PCEn IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB 1:0 ALUControl 2:0 Branch PCWrite PCSrc 1:0 CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 Zero A 31:28 25:21 A1 RD1 1 PC' PC Instr 00 0 RD Adr B ALU ALUResult ALUOut 20:16 A A2 RD2 00 EN EN 01 1 4 01 Instr / Data SrcB 10 20:16 0 Memory A3 10 15:11 1 PCJump CLK Register WD 11 File 0 Data WD3 1 <<2 27:0 <<2 SignImm 15:0 Sign Extend 25:0 (jump) Chapter 7 <65>

  37. Main Controller FSM: j S0: Fetch S1: Decode IorD = 0 S11: Jump AluSrcA = 0 Reset ALUSrcB = 01 ALUSrcA = 0 Op = J ALUOp = 00 ALUSrcB = 11 PCSrc = 00 ALUOp = 00 IRWrite PCWrite Op = ADDI Op = BEQ Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute S9: ADDI S8: Branch Execute ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10 ALUOp = 00 ALUOp = 10 PCSrc = 01 ALUOp = 00 Branch Op = SW Op = LW S7: ALU S5: MemWrite S10: ADDI Writeback S3: MemRead Writeback RegDst = 1 RegDst = 0 IorD = 1 IorD = 1 MemtoReg = 0 MemtoReg = 0 MemWrite RegWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <66>

  38. Main Controller FSM: j S0: Fetch S1: Decode IorD = 0 S11: Jump AluSrcA = 0 Reset ALUSrcB = 01 ALUSrcA = 0 Op = J ALUOp = 00 ALUSrcB = 11 PCSrc = 10 PCSrc = 00 ALUOp = 00 PCWrite IRWrite PCWrite Op = ADDI Op = BEQ Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute S9: ADDI S8: Branch Execute ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10 ALUOp = 00 ALUOp = 10 PCSrc = 01 ALUOp = 00 Branch Op = SW Op = LW S7: ALU S5: MemWrite S10: ADDI Writeback S3: MemRead Writeback RegDst = 1 RegDst = 0 IorD = 1 IorD = 1 MemtoReg = 0 MemtoReg = 0 MemWrite RegWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <67>

  39. Multicycle Processor Performance • Instructions take different number of cycles: – 3 cycles: beq , j – 4 cycles: R-Type, sw , addi – 5 cycles: lw • CPI is weighted average • SPECINT2000 benchmark: – 25% loads – 10% stores – 11% branches – 2% jumps – 52% R-type Average CPI = (0.11 + 0.02)(3) + (0.52 + 0.10)(4) + (0.25)(5) = 4.12 Chapter 7 <68>

  40. Multicycle Processor Performance Multicycle critical path: T c = t pcq + t mux + max( t ALU + t mux , t mem ) + t setup CLK PCWrite PCEn Branch IorD PCSrc Control ALUControl 2:0 Unit MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 Zero A 25:21 A1 RD1 1 PC' PC Instr 0 0 RD Adr B ALU ALUResult ALUOut A 20:16 A2 RD2 00 EN EN 1 1 4 01 Instr / Data SrcB 20:16 0 Memory A3 10 15:11 1 CLK Register WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <69>

  41. Multicycle Performance Example Element Parameter Delay (ps) Register clock-to-Q t pcq _PC 30 Register setup t setup 20 Multiplexer t mux 25 ALU t ALU 200 Memory read t mem 250 Register file read t RF read 150 Register file setup t RF setup 20 T c = ? Chapter 7 <70>

  42. Multicycle Performance Example Element Parameter Delay (ps) Register clock-to-Q t pcq _PC 30 Register setup t setup 20 Multiplexer t mux 25 ALU t ALU 200 Memory read t mem 250 Register file read t RF read 150 Register file setup t RF setup 20 T c = t pcq_PC + t mux + max( t ALU + t mux , t mem ) + t setup = t pcq_PC + t mux + t mem + t setup = [30 + 25 + 250 + 20] ps = 325 ps Chapter 7 <71>

  43. Multicycle Performance Example Program with 100 billion instructions Execution Time = ? Chapter 7 <72>

  44. Multicycle Performance Example Program with 100 billion instructions Execution Time = (# instructions) × CPI × T c = (100 × 10 9 )(4.12)(325 × 10 -12 ) = 133.9 seconds This is slower than the single-cycle processor (92.5 seconds). Why? Chapter 7 <73>

  45. Multicycle Performance Example Program with 100 billion instructions Execution Time = (# instructions) × CPI × T c = (100 × 10 9 )(4.12)(325 × 10 -12 ) = 133.9 seconds This is slower than the single-cycle processor (92.5 seconds). Why? – Not all steps same length – Sequencing overhead for each step ( t pcq + t setup = 50 ps) Chapter 7 <74>

  46. Review: Single-Cycle Processor Jump MemtoReg Control MemWrite Unit Branch PCSrc ALUControl 2:0 31:26 Op ALUSrc 5:0 Funct RegDst RegWrite CLK CLK CLK 0 WE3 Zero WE SrcA 25:21 0 A1 RD1 PC' PC Instr 0 Result 1 A RD ALU ALUResult ReadData 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcB Data Memory A3 1 Memory WriteData Register WD3 WD File 20:16 0 PCJump 15:11 1 WriteReg 4:0 PCPlus4 + SignImm <<2 4 15:0 Sign Extend PCBranch + 27:0 31:28 25:0 <<2 Chapter 7 <75>

  47. Review: Multicycle Processor CLK PCWrite PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 Zero A 31:28 25:21 A1 RD1 1 PC' PC Instr 00 0 RD Adr B ALU ALUResult ALUOut 20:16 A A2 RD2 00 EN EN 01 1 4 01 Instr / Data SrcB 10 20:16 0 Memory A3 10 15:11 1 PCJump CLK Register WD 11 File 0 Data WD3 1 <<2 27:0 <<2 ImmExt 15:0 Sign Extend 25:0 (Addr) Chapter 7 <76>

  48. Pipelined MIPS Processor • Temporal parallelism • Divide single-cycle processor into 5 stages: – Fetch – Decode – Execute – Memory – Writeback • Add pipeline registers between stages Chapter 7 <77>

  49. Single-Cycle vs. Pipelined Single-Cycle 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 Instr Time (ps) Fetch Execute Memory Write Decode 1 Instruction ALU Read / Write Reg Read Reg Fetch Execute Memory Write Decode 2 Instruction ALU Read / Write Reg Read Reg Pipelined Instr Fetch Execute Memory Write Decode 1 Instruction ALU Read/Write Reg Read Reg Fetch Execute Memory Write Decode 2 Instruction ALU Read/Write Reg Read Reg Fetch Execute Memory Write Decode 3 Instruction ALU Read/Write Reg Read Reg Chapter 7 <78>

  50. Pipelined Processor Abstraction 1 2 3 4 5 6 7 8 9 10 Time (cycles) $0 $s2 lw DM lw $s2, 40($0) IM RF 40 RF + $t1 $s3 add DM add $s3, $t1, $t2 IM RF $t2 RF + $s1 $s4 sub DM sub $s4, $s1, $s5 IM RF $s5 RF - $t5 $s5 and DM and $s5, $t5, $t6 IM RF $t6 RF & $s1 $s6 sw DM sw $s6, 20($s1) IM RF 20 RF + $t3 $s7 or DM or $s7, $t3, $t4 IM RF $t4 RF | Chapter 7 <79>

  51. Single-Cycle & Pipelined Datapath CLK CLK CLK WE3 Zero WE SrcA 25:21 0 A1 RD1 PC' PC Instr 0 A RD ALU ALUResult ReadData 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcB Data Memory A3 1 Memory WriteData Register WD3 WD File 20:16 0 WriteReg 4:0 15:11 1 PCPlus4 + SignImm 4 <<2 15:0 Sign Extend PCBranch + Result CLK CLK ALUOutW CLK CLK CLK CLK CLK WE3 WE ZeroM SrcAE 25:21 0 A1 RD1 PC' PCF InstrD 0 A RD ALU ReadDataW ALUOutM 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcBE Data Memory A3 1 Memory WriteDataM Register WriteDataE WD3 WD File RtE 20:16 0 WriteRegE 4:0 RdE 15:11 1 + SignImmE 4 <<2 15:0 PCBranchM Sign Extend + PCPlus4F PCPlus4D PCPlus4E ResultW Fetch Decode Execute Memory Writeback Chapter 7 <80>

  52. Corrected Pipelined Datapath CLK CLK ALUOutW CLK CLK CLK CLK CLK WE3 WE ZeroM SrcAE 25:21 0 A1 RD1 PC' PCF InstrD 0 A RD ALU ReadDataW ALUOutM 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcBE Data Memory A3 1 Memory WriteDataM Register WriteDataE WD3 WD File RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdE 15:11 1 SignImmE + <<2 15:0 Sign Extend PCBranchM 4 + PCPlus4F PCPlus4D PCPlus4E ResultW Fetch Decode Execute Memory Writeback WriteReg must arrive at same time as Result Chapter 7 <81>

  53. Pipelined Processor Control CLK CLK CLK RegWriteE RegWriteM RegWriteW RegWriteD Control MemtoRegE MemtoRegM MemtoRegW MemtoRegD Unit MemWriteE MemWriteM MemWriteD BranchE BranchM BranchD 31:26 PCSrcM Op ALUControlE 2:0 ALUControlD 5:0 Funct ALUSrcD ALUSrcE RegDstD RegDstE ALUOutW CLK CLK CLK CLK WE3 WE ZeroM SrcAE 25:21 0 A1 RD1 PC' PCF InstrD 0 A RD ALU ReadDataW ALUOutM 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcBE Data Memory A3 1 Memory Register WriteDataM WriteDataE WD3 WD File RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdE 15:11 1 + <<2 15:0 Sign Extend SignImmE PCBranchM 4 + PCPlus4F PCPlus4D PCPlus4E ResultW Same control unit as single-cycle processor • Control delayed to proper pipeline stage • Chapter 7 <82>

  54. Pipeline Hazards • When an instruction depends on result from instruction that hasn’t completed • Types: – Data hazard: register value not yet written back to register file – Control hazard: next instruction not decided yet (caused by branches) Chapter 7 <83>

  55. Data Hazard 1 2 3 4 5 6 7 8 Time (cycles) $s2 $s0 add DM add $s0, $s2, $s3 IM RF $s3 RF + $s0 $t0 and DM and $t0, $s0, $s1 IM RF $s1 RF & $s4 $t1 or DM or $t1, $s4, $s0 IM RF $s0 RF | $s0 $t2 sub DM sub $t2, $s0, $s5 IM RF $s5 RF - Chapter 7 <84>

  56. Handling Data Hazards • Insert nop s in code at compile time • Rearrange code at compile time • Forward data at run time • Stall the processor at run time Chapter 7 <85>

  57. Compile-Time Hazard Elimination • Insert enough nop s for result to be ready • Or move independent useful instructions forward 1 2 3 4 5 6 7 8 9 10 Time (cycles) $s2 $s0 add DM add $s0, $s2, $s3 IM RF $s3 RF + nop DM nop IM RF RF nop DM nop IM RF RF $s0 $t0 and DM and $t0, $s0, $s1 IM RF $s1 RF & $s4 $t1 or DM or $t1, $s4, $s0 IM RF $s0 RF | $s0 $t2 sub DM sub $t2, $s0, $s5 IM RF $s5 RF - Chapter 7 <86>

  58. Data Forwarding 1 2 3 4 5 6 7 8 Time (cycles) $s2 $s0 add DM add $s0, $s2, $s3 IM RF $s3 RF + $s0 $t0 and DM and $t0, $s0, $s1 IM RF $s1 RF & $s4 $t1 or DM or $t1, $s4, $s0 IM RF $s0 RF | $s0 $t2 sub DM sub $t2, $s0, $s5 IM RF $s5 RF - Chapter 7 <87>

  59. Data Forwarding CLK CLK CLK RegWriteE RegWriteM RegWriteW RegWriteD Control MemtoRegE MemtoRegM MemtoRegW MemtoRegD Unit MemWriteE MemWriteM MemWriteD ALUControlD 2:0 ALUControlE 2:0 31:26 Op ALUSrcD ALUSrcE 5:0 Funct RegDstD RegDstE PCSrcM BranchD BranchE BranchM CLK CLK CLK CLK WE3 WE SrcAE ZeroM 25:21 0 A1 RD1 00 PC' PCF InstrD A RD 01 ALU ReadDataW 1 ALUOutM 10 A RD Instruction 20:16 A2 RD2 00 0 SrcBE Data Memory 01 A3 1 Memory 10 Register WriteDataM WriteDataE WD3 WD File 1 RsD RsE ALUOutW 25:21 0 RtD RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdD RdE 15:11 1 SignImmD SignImmE + Sign 15:0 Extend 4 <<2 + PCPlus4F PCPlus4D PCPlus4E PCBranchM ResultW ForwardBE RegWriteW ForwardAE RegWriteM Hazard Unit Chapter 7 <88>

  60. Data Forwarding • Forward to Execute stage from either: – Memory stage or – Writeback stage • Forwarding logic for ForwardAE : if (( rsE != 0) AND ( rsE == WriteRegM ) AND RegWriteM ) then ForwardAE = 10 else if (( rsE != 0) AND ( rsE == WriteRegW ) AND RegWriteW ) then ForwardAE = 01 else ForwardAE = 00 Forwarding logic for ForwardBE same, but replace rsE with rtE Chapter 7 <89>

  61. Stalling 1 2 3 4 5 6 7 8 Time (cycles) $0 $s0 lw DM lw $s0, 40($0) IM RF 40 RF + Trouble! $s0 $t0 and DM and $t0, $s0, $s1 IM RF $s1 RF & $s4 $t1 or DM or $t1, $s4, $s0 IM RF $s0 RF | $s0 $t2 sub DM sub $t2, $s0, $s5 IM RF $s5 RF - Chapter 7 <90>

  62. Stalling 1 2 3 4 5 6 7 8 9 Time (cycles) $0 $s0 lw DM lw $s0, 40($0) IM RF 40 RF + $s0 $s0 $t0 and DM and $t0, $s0, $s1 IM RF $s1 RF $s1 RF & $s4 $t1 or or DM or $t1, $s4, $s0 IM IM RF $s0 RF | Stall $s0 $t2 sub DM sub $t2, $s0, $s5 IM RF $s5 RF - Chapter 7 <91>

  63. Stalling Hardware CLK CLK CLK RegWriteE RegWriteM RegWriteW RegWriteD Control MemtoRegE MemtoRegM MemtoRegW MemtoRegD Unit MemWriteE MemWriteM MemWriteD ALUControlD 2:0 ALUControlE 2:0 31:26 Op ALUSrcD ALUSrcE 5:0 Funct RegDstD RegDstE PCSrcM BranchD BranchE BranchM CLK CLK CLK CLK WE3 WE SrcAE ZeroM 25:21 0 A1 RD1 00 PC' PCF InstrD A RD 01 ALU ReadDataW ALUOutM 1 EN 10 A RD Instruction 20:16 A2 RD2 0 00 SrcBE Data Memory 01 A3 1 Memory 10 Register WriteDataM WriteDataE WD3 WD File 1 RsD RsE ALUOutW 25:21 0 RtD RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdD RdE 15:11 1 SignImmD SignImmE + Sign 15:0 Extend 4 <<2 + PCPlus4F PCPlus4D CLR PCPlus4E EN PCBranchM ResultW MemtoRegE ForwardBE RegWriteW ForwardAE RegWriteM FlushE StallF StallD Hazard Unit Chapter 7 <92>

  64. Stalling Logic lwstall = (( rsD == rtE ) OR ( rtD == rtE )) AND MemtoRegE StallF = StallD = FlushE = lwstall Chapter 7 <93>

  65. Control Hazards • beq : – branch not determined until 4 th stage of pipeline – Instructions after branch fetched before branch occurs – These instructions must be flushed if branch happens • Branch misprediction penalty – number of instruction flushed when branch is taken – May be reduced by determining branch earlier Chapter 7 <94>

  66. Control Hazards: Original Pipeline CLK CLK CLK RegWriteE RegWriteM RegWriteW RegWriteD Control MemtoRegE MemtoRegM MemtoRegW MemtoRegD Unit MemWriteE MemWriteM MemWriteD ALUControlD 2:0 ALUControlE 2:0 31:26 Op ALUSrcD ALUSrcE 5:0 Funct RegDstD RegDstE PCSrcM BranchD BranchE BranchM CLK CLK CLK CLK WE3 WE SrcAE ZeroM 25:21 0 A1 RD1 00 PC' PCF InstrD A RD 01 ALU ReadDataW ALUOutM 1 10 EN A RD Instruction 20:16 A2 RD2 0 00 SrcBE Data Memory 01 A3 1 Memory 10 Register WriteDataM WriteDataE WD3 WD File 1 RsD RsE ALUOutW 25:21 0 RtD RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdD RdE 15:11 1 SignImmD SignImmE + Sign 15:0 Extend 4 <<2 + PCPlus4F PCPlus4D PCPlus4E CLR EN PCBranchM ResultW MemtoRegE ForwardBE RegWriteW ForwardAE RegWriteM FlushE StallD StallF Hazard Unit Chapter 7 <95>

  67. Control Hazards 1 2 3 4 5 6 7 8 9 Time (cycles) $t1 lw DM 20 beq $t1, $t2, 40 IM RF $t2 RF - $s0 and DM 24 and $t0, $s0, $s1 IM RF $s1 RF & Flush these $s4 or DM instructions 28 or $t1, $s4, $s0 IM RF $s0 RF | $s0 sub DM 2C sub $t2, $s0, $s5 IM RF $s5 RF - 30 ... ... $s2 $t3 slt DM 64 slt $t3, $s2, $s3 slt IM RF $s3 RF Chapter 7 <96>

  68. Early Branch Resolution CLK CLK CLK RegWriteE RegWriteM RegWriteW RegWriteD Control MemtoRegE MemtoRegM MemtoRegW MemtoRegD Unit MemWriteE MemWriteM MemWriteD ALUControlD 2:0 ALUControlE 2:0 31:26 Op ALUSrcD ALUSrcE 5:0 Funct RegDstD RegDstE BranchD PCSrcD EqualD CLK CLK CLK CLK = WE3 WE SrcAE 25:21 0 A1 RD1 00 PC' PCF InstrD A RD 01 ALU ReadDataW ALUOutM 1 10 EN A RD Instruction 20:16 A2 RD2 0 00 SrcBE Data Memory 01 A3 1 Memory 10 Register WriteDataM WriteDataE WD3 WD File 1 RsD RsE ALUOutW 25:21 0 RtD RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdE RdE 15:11 1 SignImmD SignImmE + Sign 15:0 Extend 4 <<2 + PCPlus4F PCPlus4D CLR CLR EN PCBranchD ResultW MemtoRegE RegWriteW ForwardBE ForwardAE RegWriteM FlushE StallF StallD Hazard Unit Introduced another data hazard in Decode stage Chapter 7 <97>

  69. Early Branch Resolution 1 2 3 4 5 6 7 8 9 Time (cycles) $t1 lw DM 20 beq $t1, $t2, 40 IM RF $t2 RF - $s0 Flush and DM 24 and $t0, $s0, $s1 IM RF $s1 RF this & instruction 28 or $t1, $s4, $s0 2C sub $t2, $s0, $s5 30 ... ... $s2 $t3 slt DM 64 slt $t3, $s2, $s3 slt IM RF $s3 RF Chapter 7 <98>

  70. Handling Data & Control Hazards CLK CLK CLK RegWriteE RegWriteM RegWriteW RegWriteD Control MemtoRegE MemtoRegM MemtoRegW MemtoRegD Unit MemWriteE MemWriteM MemWriteD ALUControlD 2:0 ALUControlE 2:0 31:26 Op ALUSrcD ALUSrcE 5:0 Funct RegDstD RegDstE BranchD PCSrcD EqualD CLK CLK CLK CLK = WE3 WE SrcAE 25:21 0 A1 RD1 0 00 PC' PCF InstrD A RD 01 ALU ReadDataW ALUOutM 1 1 10 EN A RD Instruction 20:16 A2 RD2 0 0 00 SrcBE Data Memory 01 A3 1 1 Memory 10 Register WriteDataM WriteDataE WD3 WD File 1 RsD RsE ALUOutW 25:21 0 RtD RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdD RdE 15:11 1 SignImmD SignImmE Sign + 15:0 Extend 4 <<2 + PCPlus4F PCPlus4D CLR CLR EN PCBranchD ResultW MemtoRegE ForwardBD ForwardBE RegWriteW ForwardAD ForwardAE RegWriteM RegWriteE BranchD FlushE StallD StallF Hazard Unit Chapter 7 <99>

  71. Control Forwarding & Stalling Logic • Forwarding logic: ForwardAD = ( rsD !=0) AND ( rsD == WriteRegM ) AND RegWriteM ForwardBD = ( rtD !=0) AND ( rtD == WriteRegM) AND RegWriteM • Stalling logic: branchstall = BranchD AND [RegWriteE AND (( WriteRegE == rsD) OR ( WriteRegE == rtD )) OR [ MemtoRegM AND (( WriteRegM == rsD) OR ( WriteRegM == rtD ))] StallF = StallD = FlushE = lwstall OR branchstall Chapter 7 <100>

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend