pipelining 5 stage pipeline
play

PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant - PowerPoint PPT Presentation

PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Tonight: Homework 1 deadline (11:59PM) n Verify your uploaded files


  1. PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture

  2. Overview ¨ Announcement ¤ Tonight: Homework 1 deadline (11:59PM) n Verify your uploaded files before deadline ¨ This lecture ¤ Impacts of pipelining on performance ¤ The MIPS five-stage pipeline ¤ Pipeline hazards n Structural hazards n Data hazards

  3. Single-cycle RISC Architecture ¨ Example: simple MIPS architecture ¤ Critical path includes all of the processing steps Write Back Controller PC Inst. Register Data ALU Memory File Memory Inst. Fetch Inst. Decode Execute Memory

  4. Single-cycle RISC Architecture ¨ Example program ¤ CT=6ns; CPU Time = ? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

  5. Single-cycle RISC Architecture ¨ Example program ¤ CT=6ns; CPU Time = 5 x 1 x 6ns = 30ns AND R1,R2,R3 How to improve? XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

  6. Reusing Idle Resources ¨ Each processing step finishes in a fraction of a cycle ¤ Idle resources can be reused for processing next instructions Write Back PC Inst. Register Data ALU Memory File Memory Inst. Fetch Inst. Decode Execute Memory

  7. Pipelined Architecture ¨ Five stage pipeline ¤ Critical path determines the cycle time 0.7ns Write Back PC Inst. Register Data ALU Memory File Memory Inst. Fetch Inst. Decode Execute Memory 1.5ns 1.05ns 1.25ns 1.5ns

  8. Pipelined Architecture ¨ Example program ¤ CT=1.5ns; CPU Time = ? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

  9. Pipelined Architecture ¨ Example program ¤ CT=1.5ns; CPU Time = 5 x 5 x 1.5ns = 37.5ns > 30ns WORSE!! AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

  10. Pipelined Architecture ¨ Example program ¤ CT=1.5ns; CPU Time = ? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

  11. Pipelined Architecture ¨ Example program ¤ CT=1.5ns; CPU Time = 9 x 1 x 1.5ns = 13.5ns What is the cost of pipelining? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

  12. Pipelining Technique ¨ Improving throughput at the expense of latency ¤ Delay: D = T + n δ ¤ Throughput: IPS = n/(T + n δ ) Combinational Logic Critical Path Delay = 30

  13. Pipelining Technique ¨ Improving throughput at the expense of latency ¤ Delay: D = T + n δ ¤ Throughput: IPS = n/(T + n δ ) Combinational Logic D = Critical Path Delay = 30 IPS = Combinational Logic Combinational Logic D = IPS = Critical Path Delay = 15 Critical Path Delay = 15 Comb. Logic Comb. Logic Comb. Logic D = IPS = Delay = 10 Delay = 10 Delay = 10

  14. Pipelining Technique ¨ Improving throughput at the expense of latency ¤ Delay: D = T + n δ ¤ Throughput: IPS = n/(T + n δ ) Combinational Logic D = 31 Critical Path Delay = 30 IPS = 1/31 Combinational Logic Combinational Logic D = 32 IPS = 2/32 Critical Path Delay = 15 Critical Path Delay = 15 Comb. Logic Comb. Logic Comb. Logic D = 33 IPS = 3/33 Delay = 10 Delay = 10 Delay = 10

  15. Pipelining Latency vs. Throughput ¨ Theoretical delay and throughput models for perfect pipelining Delay (D) 20 Relative Performance 15 10 5 0 0 50 100 150 200 Number of Pipeline Stages

  16. Pipelining Latency vs. Throughput ¨ Theoretical delay and throughput models for perfect pipelining Delay (D) Throughput (IPS) 20 Relative Performance 15 10 5 0 0 50 100 150 200 Number of Pipeline Stages

  17. Five Stage MIPS Pipeline

  18. Simple Five Stage Pipeline ¨ A pipelined load-store architecture that processes up to one instruction per cycle Write Back PC Inst. Register Data ALU Memory File Memory Inst. Fetch Inst. Decode Execute Memory

  19. Instruction Fetch ¨ Read an instruction from memory (I-Memory) ¤ Use the program counter (PC) to index into the I- Memory ¤ Compute NPC by incrementing current PC n What about branches? ¨ Update pipeline registers ¤ Write the instruction into the pipeline registers

  20. Instruction Fetch clock Branch Target NPC = PC + 4 NPC clock PC + Why increment 4 by 4? Instruction Memory Pipeline Register

  21. Instruction Fetch clock P3 Branch Target NPC = PC + 4 NPC clock PC + P2 Why increment 4 by 4? Instruction P1 Memory Critical Path = Max{P1, P2, P3} Pipeline Register

  22. Instruction Decode ¨ Generate control signals for the opcode bits ¨ Read source operands from the register file (RF) ¤ Use the specifiers for indexing RF n How many read ports are required? ¨ Update pipeline registers ¤ Send the operand and immediate values to next stage ¤ Pass control signals and NPC to next stage

  23. Instruction Decode target NPC NPC reg Register Instruction File reg ctrl decode Pipeline Pipeline Register Register

  24. Execute Stage ¨ Perform ALU operation ¤ Compute the result of ALU n Operation type: control signals n First operand: contents of a register n Second operand: either a register or the immediate value ¤ Compute branch target n Target = NPC + immediate ¨ Update pipeline registers ¤ Control signals, branch target, ALU results, and destination

  25. Execute Stage Target NPC + Res reg ALU reg reg ctrl ctrl Pipeline Pipeline Register Register

  26. Memory Access ¨ Access data memory ¤ Load/store address: ALU outcome ¤ Control signals determine read or write access ¨ Update pipeline registers ¤ ALU results from execute ¤ Loaded data from D-Memory ¤ Destination register

  27. Memory Access Target Res Res addr Dat reg Memory data data ctrl ctrl Pipeline Pipeline Register Register

  28. Register Write Back ¨ Update register file ¤ Control signals determine if a register write is needed ¤ Only one write port is required n Write the ALU result to the destination register, or n Write the loaded data into the register file

  29. Five Stage Pipeline ¨ Ideal pipeline: IPC=1 ¤ Is there enough resources to keep the pipeline stages busy all the time? Inst. Fetch Decode Execute Memory Writeback + + PC ALU Reg. Reg. 4 File File Mem Mem

  30. Pipeline Hazards

  31. Pipeline Hazards ¨ Structural hazards: multiple instructions compete for the same resource ¨ Data hazards: a dependent instruction cannot proceed because it needs a value that hasn’t been produced ¨ Control hazards: the next instruction cannot be fetched because the outcome of an earlier branch is unknown

  32. Structural Hazards ¨ 1. Unified memory for instruction and data R1 ß Mem[R2] R3 ß Mem[R20] R6 ß R4-R5 R7 ß R1+R0

  33. Structural Hazards ¨ 1. Unified memory for instruction and data R1 ß Mem[R2] R3 ß Mem[R20] R6 ß R4-R5 R7 ß R1+R0 Separate inst. and data memories.

  34. Structural Hazards ¨ 1. Unified memory for instruction and data ¨ 2. Register file with shared read/write access ports R1 ß Mem[R2] R3 ß Mem[R20] R6 ß R4-R5 R7 ß R1+R0

  35. Structural Hazards ¨ 1. Unified memory for instruction and data ¨ 2. Register file with shared read/write access ports R1 ß Mem[R2] R3 ß Mem[R20] R6 ß R4-R5 R7 ß R1+R0 Register access in half cycles.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend