PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant - - PowerPoint PPT Presentation
PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant - - PowerPoint PPT Presentation
PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Tonight: Homework 1 deadline (11:59PM) n Verify your uploaded files
Overview
¨ Announcement
¤ Tonight: Homework 1 deadline (11:59PM)
n Verify your uploaded files before deadline ¨ This lecture
¤ Impacts of pipelining on performance ¤ The MIPS five-stage pipeline ¤ Pipeline hazards
n Structural hazards n Data hazards
Single-cycle RISC Architecture
¨ Example: simple MIPS architecture
¤ Critical path includes all of the processing steps Write Back
- Inst. Fetch
- Inst. Decode
Execute Memory Inst. Memory Register File ALU Data Memory PC Controller
Single-cycle RISC Architecture
¨ Example program
¤ CT=6ns; CPU Time = ? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT
Single-cycle RISC Architecture
¨ Example program
¤ CT=6ns; CPU Time = 5 x 1 x 6ns = 30ns AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT
How to improve?
Reusing Idle Resources
¨ Each processing step finishes in a fraction of a cycle
¤ Idle resources can be reused for processing next
instructions
Write Back
- Inst. Fetch
- Inst. Decode
Execute Memory Inst. Memory Register File ALU Data Memory PC
Pipelined Architecture
¨ Five stage pipeline
¤ Critical path determines the cycle time Write Back
- Inst. Fetch
- Inst. Decode
Execute Memory Inst. Memory Register File ALU Data Memory PC 1.5ns 1.5ns 1.25ns 1.05ns 0.7ns
Pipelined Architecture
¨ Example program
¤ CT=1.5ns; CPU Time = ? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT
Pipelined Architecture
¨ Example program
¤ CT=1.5ns; CPU Time = 5 x 5 x 1.5ns = 37.5ns > 30ns AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT WORSE!!
Pipelined Architecture
¨ Example program
¤ CT=1.5ns; CPU Time = ? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT
Pipelined Architecture
¨ Example program
¤ CT=1.5ns; CPU Time = 9 x 1 x 1.5ns = 13.5ns AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT
What is the cost of pipelining?
Pipelining Technique
¨ Improving throughput at the expense of latency
¤ Delay: D = T + nδ ¤ Throughput: IPS = n/(T + nδ) Combinational Logic Critical Path Delay = 30
Pipelining Technique
¨ Improving throughput at the expense of latency
¤ Delay: D = T + nδ ¤ Throughput: IPS = n/(T + nδ) Combinational Logic Critical Path Delay = 30 Combinational Logic Critical Path Delay = 15 Combinational Logic Critical Path Delay = 15
- Comb. Logic
Delay = 10
- Comb. Logic
Delay = 10
- Comb. Logic
Delay = 10 D = IPS = D = IPS = D = IPS =
Pipelining Technique
¨ Improving throughput at the expense of latency
¤ Delay: D = T + nδ ¤ Throughput: IPS = n/(T + nδ) Combinational Logic Critical Path Delay = 30 Combinational Logic Critical Path Delay = 15 Combinational Logic Critical Path Delay = 15
- Comb. Logic
Delay = 10
- Comb. Logic
Delay = 10
- Comb. Logic
Delay = 10 D = 31 IPS = 1/31 D = 32 IPS = 2/32 D = 33 IPS = 3/33
Pipelining Latency vs. Throughput
¨ Theoretical delay and throughput models for
perfect pipelining
5 10 15 20 50 100 150 200 Relative Performance Number of Pipeline Stages Delay (D)
Pipelining Latency vs. Throughput
¨ Theoretical delay and throughput models for
perfect pipelining
5 10 15 20 50 100 150 200 Relative Performance Number of Pipeline Stages Delay (D) Throughput (IPS)
Five Stage MIPS Pipeline
Simple Five Stage Pipeline
¨ A pipelined load-store architecture that processes
up to one instruction per cycle
Write Back
- Inst. Fetch
- Inst. Decode
Execute Memory Inst. Memory Register File ALU Data Memory PC
Instruction Fetch
¨ Read an instruction from memory (I-Memory)
¤ Use the program counter (PC) to index into the I-
Memory
¤ Compute NPC by incrementing current PC
n What about branches? ¨ Update pipeline registers
¤ Write the instruction into the pipeline registers
Instruction Fetch
Memory PC + 4 NPC Instruction Branch Target Pipeline Register Why increment by 4? NPC = PC + 4 clock clock
Instruction Fetch
Memory PC + 4 NPC Instruction Branch Target Pipeline Register Why increment by 4? NPC = PC + 4 Critical Path = Max{P1, P2, P3} P1 P2 P3 clock clock
Instruction Decode
¨ Generate control signals for the opcode bits ¨ Read source operands from the register file (RF)
¤ Use the specifiers for indexing RF
n How many read ports are required? ¨ Update pipeline registers
¤ Send the operand and immediate values to next stage ¤ Pass control signals and NPC to next stage
Instruction Decode
Register File ctrl Pipeline Register NPC NPC Instruction Pipeline Register reg reg decode target
Execute Stage
¨ Perform ALU operation
¤ Compute the result of ALU
n Operation type: control signals n First operand: contents of a register n Second operand: either a register or the immediate value
¤ Compute branch target
n Target = NPC + immediate ¨ Update pipeline registers
¤ Control signals, branch target, ALU results, and
destination
Execute Stage
ALU ctrl Pipeline Register NPC Target Pipeline Register reg reg + reg ctrl Res
Memory Access
¨ Access data memory
¤ Load/store address: ALU outcome ¤ Control signals determine read or write access
¨ Update pipeline registers
¤ ALU results from execute ¤ Loaded data from D-Memory ¤ Destination register
Memory Access
ctrl Pipeline Register Target Pipeline Register Res reg Dat ctrl Res Memory addr data data
Register Write Back
¨ Update register file
¤ Control signals determine if a register write is needed ¤ Only one write port is required
n Write the ALU result to the destination register, or n Write the loaded data into the register file
Five Stage Pipeline
¨ Ideal pipeline: IPC=1
¤ Is there enough resources to keep the pipeline stages
busy all the time?
+ 4
PC
+ Mem Reg. File ALU Mem Reg. File
- Inst. Fetch
Decode Execute Memory Writeback
Pipeline Hazards
Pipeline Hazards
¨ Structural hazards: multiple instructions compete for
the same resource
¨ Data hazards: a dependent instruction cannot
proceed because it needs a value that hasn’t been produced
¨ Control hazards: the next instruction cannot be
fetched because the outcome of an earlier branch is unknown
Structural Hazards
¨ 1. Unified memory for instruction and data
R1ß Mem[R2] R7ß R1+R0 R6ß R4-R5 R3ß Mem[R20]
Structural Hazards
¨ 1. Unified memory for instruction and data
R1ß Mem[R2] R7ß R1+R0 R6ß R4-R5 R3ß Mem[R20] Separate inst. and data memories.
Structural Hazards
¨ 1. Unified memory for instruction and data ¨ 2. Register file with shared read/write access ports
R1ß Mem[R2] R7ß R1+R0 R6ß R4-R5 R3ß Mem[R20]
Structural Hazards
¨ 1. Unified memory for instruction and data ¨ 2. Register file with shared read/write access ports
R1ß Mem[R2] R7ß R1+R0 R6ß R4-R5 R3ß Mem[R20] Register access in half cycles.