Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of - PowerPoint PPT Presentation

Processor Design － Pipelined Processor Hung-Wei Tseng

Drawbacks of a single-cycle processor • The cycle time is determined by the longest instruction • Could be very long, thinking about fetch data from DRAM • Hardware is mostly idle 3

Pipelining • Break up the logic with “pipeline registers” into pipeline stages • Each stage can act on different instruction/data • States/Control Signals of instructions are hold in pipeline registers (latches) latch latch 10ns latch latch latch latch latch latch 2ns 2ns 2ns 2ns 2ns 4

cycle #5 cycle #4 cycle #3 cycle #2 cycle #1 latch latch latch latch latch 2ns 2ns 2ns 2ns 2ns latch latch latch latch latch Pipelining 2ns 2ns 2ns 2ns 2ns latch latch latch latch latch 2ns 2ns 2ns 2ns 2ns latch latch latch latch latch 2ns 2ns 2ns 2ns 2ns latch latch latch latch latch 2ns 2ns 2ns 2ns 2ns latch latch latch latch latch 5

Cycle time of a pipeline processor • Critical path is the longest possible delay between two registers in a design. • The critical path sets the cycle time, since the cycle time must be long enough for a signal to traverse the critical path. • Lengthening or shortening non-critical paths does not change performance • Ideally, all paths are about the same length 7

Pipeline a MIPS processor • Instruction Fetch Instruction Fetch (IF) • Read the instruction • Decode Instruction Decode (ID) • Figure out the incoming instruction? • Fetch the operands from the register file • Execution: ALU Execution (EXE) • Perform ALU functions • Memory access Memory Access (MEM) • Read/write data memory • Write back results to registers Write Back (WB) • Write to register file 9

Pipelined datapath Memory Write Instruction Fetch Instruction Decode Execution Access Back PCSrc = Branch & Zero PCSrc 1 m u x 0 Add Add 4 RegWrite MemWrite Shi> le>?2 inst[25:21] Read+Reg+1 Data Instruc(on Read Register Memory Memory inst[20:16] +Data+1 MemtoReg Read+Reg+2 Zero Read ALUSrc File Read 0 m PC Address Address u ALU inst[31:0] Write+Reg 1 x Data 1 Read inst[15:11] 0 +Data+2 m RegDst Write+Data m u Write+Data u x x ALUop sign- 0 1 MemRead 16 32 extend MEM/WB IF/ID ID/EX EX/MEM Will this work? 10

Pipelined datapath PCSrc 1 m u x 0 Add Add RegWrite MemWrite 4 Shi> le>?2 inst[25:21] Read+Reg+1 Data Instruc(on Read Register Memory Memory inst[20:16] +Data+1 MemtoReg Read+Reg+2 Zero Read ALUSrc File Read 0 m PC Address Address u ALU inst[31:0] Write+Reg 1 x Data 1 Read inst[15:11] 0 +Data+2 m RegDst Write+Data m u Write+Data u x x ALUop sign- 0 1 MemRead 16 32 extend add $1, $2, $3 lw $4, 0($5) MEM/WB IF/ID ID/EX EX/MEM sub $6, $7, $8 sub $9,$10,$11 11 sw $1, 0($12)

Pipelined datapath Is this right? PCSrc 1 m u x 0 Add Add RegWrite MemWrite 4 Shi> le>?2 inst[25:21] Read+Reg+1 Data Instruc(on Read Register Memory Memory inst[20:16] +Data+1 MemtoReg Read+Reg+2 Zero Read ALUSrc File Read 0 m PC Address Address u ALU inst[31:0] Write+Reg 1 x Data 1 Read inst[15:11] 0 +Data+2 m RegDst Write+Data m u Write+Data u x x ALUop sign- 0 1 MemRead 16 32 extend add $1, $2, $3 lw $4, 0($5) MEM/WB IF/ID ID/EX EX/MEM sub $6, $7, $8 sub $9,$10,$11 15 sw $1, 0($12)

Pipelined datapath PCSrc 1 MEM/WB IF/ID ID/EX EX/MEM m u x 0 Add Add 4 RegWrite MemWrite Shi> le>?2 inst[25:21] Read+Reg+1 Data Instruc(on Read Register Memory Memory inst[20:16] +Data+1 MemtoReg Read+Reg+2 Zero Read ALUSrc File Read PC Address Address ALU inst[31:0] Write+Reg 1 Data Read 0 +Data+2 m Write+Data m u Write+Data u x x ALUop sign- 0 1 inst[15:11] MemRead RegDst 16 32 extend 0 m u x 1 16

Pipelined datapath + control PCSrc RegWrite 1 MEM/WB IF/ID ID/EX EX/MEM m u x WB WB WB 0 Control ME ME EX Add Add 4 RegWrite MemWrite Shi> le>?2 inst[25:21] Read+Reg+1 Data Instruc(on Read Register Memory Memory inst[20:16] +Data+1 MemtoReg Read+Reg+2 Zero Read ALUSrc File Read PC Address Address ALU inst[31:0] Write+Reg 1 Data Read 0 +Data+2 m Write+Data m u Write+Data u x x ALUop sign- 0 1 inst[15:11] MemRead RegDst 16 32 extend 0 m u x 1 17

Simplified pipeline diagram • Use symbols to represent the physical resources with the abbreviations for pipeline stages. • IF, ID, EXE, MEM, WB • Horizontal axis represent the timeline, vertical axis for the instruction stream • Example: add $1, $2, $3 IF ID EXE MEM WB lw $4, 0($5) IF ID EXE MEM WB sub $6, $7, $8 IF ID EXE MEM WB sub $9,$10,$11 IF ID EXE MEM WB sw $1, 0($12) IF ID EXE MEM WB 18

Pipeline hazards 19

Pipeline hazards • Even though we perfectly divide pipeline stages, it’s still hard to achieve CPI == 1. • Pipeline hazards: • Structural hazard • The hardware does not allow two pipeline stages to work concurrently • Data hazard • A later instruction in a pipeline stage depends on the outcome of an earlier instruction in the pipeline • Control hazard • The processor is not clear about what’s the next instruction to fetch 20

Structural hazard 21

Structural hazard • The hardware cannot support the combination of instructions that we want to execute at the same cycle • The original pipeline incurs structural hazard when two instructions competing the same register. • Solution: write early, read late • Writes occur at the clock edge and complete long enough before the end of the clock cycle. • This leaves enough time for outputs to settle for reads • The revised register file is the default one from now! add $1 , $2, $3 IF ID EXE MEM WB lw $4, 0($5) IF ID EXE MEM WB sub $6, $7, $8 IF ID EXE MEM WB sub $9,$10, $1 IF ID EXE MEM WB sw $1 , 0($12) IF ID EXE MEM WB 23

Data hazard 25

Data hazard • When an instruction in the pipeline needs a value that is not available • Data dependences • The output of an instruction is the input of a later instruction • May result in data hazard if the later instruction that consumes the result is still in the pipeline 27

Sol. of data hazard I: Stall • When the source operand of an instruction is not ready, stall the pipeline • Suspend the instruction and the following instruction • Allow the previous instructions to proceed • This introduces a pipeline bubble: a bubble does nothing, propagate through the pipeline like a nop instruction • How to stall the pipeline? • Disable the PC update • Disable the pipeline registers on the earlier pipeline stages • When the stall is over, re-enable the pipeline registers, PC updates 29

Performance of stall add $1, $2, $3 IF ID EXE MEM WB lw $4, 0($1) IF ID ID ID EXE MEM WB sub $5, $2, $4 IF IF IF ID ID ID EXE MEM WB sub $1, $3, $1 IF IF IF ID EXE MEM WB sw $1, 0($5) IF ID ID ID EXE MEM WB 15 cycles! CPI == 3 (If there is no stall, CPI should be just 1!) 30

Sol. of data hazard II: Forwarding • The result is available after EXE and MEM stage, but publicized in WB! • The data is already there, we should use it right away! • Also called bypassing add $1, $2, $3 IF ID EXE lw $4, 0($1) IF ID sub $5, $2, $4 IF sub $1, $3, $1 We obtain the result here! sw $1, 0($5) 31

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of - PowerPoint PPT Presentation

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor The cycle time is determined by the longest instruction Could be very long, thinking about fetch data from DRAM Hardware is mostly idle 3

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

CSEE 3827: Fundamentals of Computer Systems, Spring 2011 9. Pipelined MIPS Processor Prof.

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

LECTURE 9 Pipeline Hazards PIPELINED DATAPATH AND CONTROL In the previous lecture, we

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Energy Minimization of Pipeline Processor Using a Low Voltage Pipelined Cache Vincent J. Mooney

Designing a Pipelined Processor Computer System Architecture Go back and examine your

Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and

Processor Datapath Levels in Processor Design We can talk about design at a variety of levels

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

Outline Introduction to CMOS VLSI Design Partitioning Design MIPS Processor Example

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Ch. 5: Processor + Memory December 12, 2008 Ch. 5: Processor + Memory Overview of Implementation

- A Tutorial - Based on Slides from Dr. Bibhudatta Sahoo University of Illinois at

!"#$"%&&'(%)#"'#+'(%$,#+-.' /#"01#"2%'3+,,-*,4%&

Automatic Printer Driver Installation in Fedora 13 Presenter Tim Waugh Senior Software

Biennial Hazardous Waste Report Business Operations Unit Department of Toxic Substances Control

Cosette: An Automated Solver for SQL Chenglong Shumo Konstantin Alvin Dan Wang Chu Weitz

AsterixDB A Scalable Open Source DBMS This presentation is based on slides made by Michael J.

OSM Data Processing with PostgreSQL / PostGIS Jochen Topf jochentopf.com OpenStreetMap

Polytype Semantics Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of

!" # Chapter 3 Describing Syntax and Semantics CS-4337 Organization of Programming

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of - PowerPoint PPT Presentation

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor The cycle time is determined by the longest instruction Could be very long, thinking about fetch data from DRAM Hardware is mostly idle 3

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

CSEE 3827: Fundamentals of Computer Systems, Spring 2011 9. Pipelined MIPS Processor Prof.

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

LECTURE 9 Pipeline Hazards PIPELINED DATAPATH AND CONTROL In the previous lecture, we

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Energy Minimization of Pipeline Processor Using a Low Voltage Pipelined Cache Vincent J. Mooney

Designing a Pipelined Processor Computer System Architecture Go back and examine your

Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and

Processor Datapath Levels in Processor Design We can talk about design at a variety of levels

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

Outline Introduction to CMOS VLSI Design Partitioning Design MIPS Processor Example

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Ch. 5: Processor + Memory December 12, 2008 Ch. 5: Processor + Memory Overview of Implementation

- A Tutorial - Based on Slides from Dr. Bibhudatta Sahoo University of Illinois at

!&quot;#$&quot;%&amp;&amp;'(%)#&quot;*'#+'(%$,#+-.' /#&quot;01#&quot;2%'3+,*,-*,4%&amp;

Automatic Printer Driver Installation in Fedora 13 Presenter Tim Waugh Senior Software

Biennial Hazardous Waste Report Business Operations Unit Department of Toxic Substances Control

Cosette: An Automated Solver for SQL Chenglong Shumo Konstantin Alvin Dan Wang Chu Weitz

AsterixDB A Scalable Open Source DBMS This presentation is based on slides made by Michael J.

OSM Data Processing with PostgreSQL / PostGIS Jochen Topf jochentopf.com OpenStreetMap

Polytype Semantics Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of

!&quot; # Chapter 3 Describing Syntax and Semantics CS-4337 Organization of Programming

!"#$"%&&'(%)#"'#+'(%$,#+-.' /#"01#"2%'3+,,-*,4%&

!" # Chapter 3 Describing Syntax and Semantics CS-4337 Organization of Programming