PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant - PowerPoint PPT Presentation

PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture

Overview ¨ Announcement ¤ Tonight: Homework 1 deadline (11:59PM) n Verify your uploaded files before deadline ¨ This lecture ¤ Impacts of pipelining on performance ¤ The MIPS five-stage pipeline ¤ Pipeline hazards n Structural hazards n Data hazards

Single-cycle RISC Architecture ¨ Example: simple MIPS architecture ¤ Critical path includes all of the processing steps Write Back Controller PC Inst. Register Data ALU Memory File Memory Inst. Fetch Inst. Decode Execute Memory

Single-cycle RISC Architecture ¨ Example program ¤ CT=6ns; CPU Time = ? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

Single-cycle RISC Architecture ¨ Example program ¤ CT=6ns; CPU Time = 5 x 1 x 6ns = 30ns AND R1,R2,R3 How to improve? XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

Reusing Idle Resources ¨ Each processing step finishes in a fraction of a cycle ¤ Idle resources can be reused for processing next instructions Write Back PC Inst. Register Data ALU Memory File Memory Inst. Fetch Inst. Decode Execute Memory

Pipelined Architecture ¨ Five stage pipeline ¤ Critical path determines the cycle time 0.7ns Write Back PC Inst. Register Data ALU Memory File Memory Inst. Fetch Inst. Decode Execute Memory 1.5ns 1.05ns 1.25ns 1.5ns

Pipelined Architecture ¨ Example program ¤ CT=1.5ns; CPU Time = ? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

Pipelined Architecture ¨ Example program ¤ CT=1.5ns; CPU Time = 5 x 5 x 1.5ns = 37.5ns > 30ns WORSE!! AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

Pipelined Architecture ¨ Example program ¤ CT=1.5ns; CPU Time = ? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

Pipelined Architecture ¨ Example program ¤ CT=1.5ns; CPU Time = 9 x 1 x 1.5ns = 13.5ns What is the cost of pipelining? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

Pipelining Technique ¨ Improving throughput at the expense of latency ¤ Delay: D = T + n δ ¤ Throughput: IPS = n/(T + n δ ) Combinational Logic Critical Path Delay = 30

Pipelining Technique ¨ Improving throughput at the expense of latency ¤ Delay: D = T + n δ ¤ Throughput: IPS = n/(T + n δ ) Combinational Logic D = Critical Path Delay = 30 IPS = Combinational Logic Combinational Logic D = IPS = Critical Path Delay = 15 Critical Path Delay = 15 Comb. Logic Comb. Logic Comb. Logic D = IPS = Delay = 10 Delay = 10 Delay = 10

Pipelining Technique ¨ Improving throughput at the expense of latency ¤ Delay: D = T + n δ ¤ Throughput: IPS = n/(T + n δ ) Combinational Logic D = 31 Critical Path Delay = 30 IPS = 1/31 Combinational Logic Combinational Logic D = 32 IPS = 2/32 Critical Path Delay = 15 Critical Path Delay = 15 Comb. Logic Comb. Logic Comb. Logic D = 33 IPS = 3/33 Delay = 10 Delay = 10 Delay = 10

Pipelining Latency vs. Throughput ¨ Theoretical delay and throughput models for perfect pipelining Delay (D) 20 Relative Performance 15 10 5 0 0 50 100 150 200 Number of Pipeline Stages

Pipelining Latency vs. Throughput ¨ Theoretical delay and throughput models for perfect pipelining Delay (D) Throughput (IPS) 20 Relative Performance 15 10 5 0 0 50 100 150 200 Number of Pipeline Stages

Five Stage MIPS Pipeline

Simple Five Stage Pipeline ¨ A pipelined load-store architecture that processes up to one instruction per cycle Write Back PC Inst. Register Data ALU Memory File Memory Inst. Fetch Inst. Decode Execute Memory

Instruction Fetch ¨ Read an instruction from memory (I-Memory) ¤ Use the program counter (PC) to index into the I- Memory ¤ Compute NPC by incrementing current PC n What about branches? ¨ Update pipeline registers ¤ Write the instruction into the pipeline registers

Instruction Fetch clock Branch Target NPC = PC + 4 NPC clock PC + Why increment 4 by 4? Instruction Memory Pipeline Register

Instruction Fetch clock P3 Branch Target NPC = PC + 4 NPC clock PC + P2 Why increment 4 by 4? Instruction P1 Memory Critical Path = Max{P1, P2, P3} Pipeline Register

Instruction Decode ¨ Generate control signals for the opcode bits ¨ Read source operands from the register file (RF) ¤ Use the specifiers for indexing RF n How many read ports are required? ¨ Update pipeline registers ¤ Send the operand and immediate values to next stage ¤ Pass control signals and NPC to next stage

Instruction Decode target NPC NPC reg Register Instruction File reg ctrl decode Pipeline Pipeline Register Register

Execute Stage ¨ Perform ALU operation ¤ Compute the result of ALU n Operation type: control signals n First operand: contents of a register n Second operand: either a register or the immediate value ¤ Compute branch target n Target = NPC + immediate ¨ Update pipeline registers ¤ Control signals, branch target, ALU results, and destination

Execute Stage Target NPC + Res reg ALU reg reg ctrl ctrl Pipeline Pipeline Register Register

Memory Access ¨ Access data memory ¤ Load/store address: ALU outcome ¤ Control signals determine read or write access ¨ Update pipeline registers ¤ ALU results from execute ¤ Loaded data from D-Memory ¤ Destination register

Memory Access Target Res Res addr Dat reg Memory data data ctrl ctrl Pipeline Pipeline Register Register

Register Write Back ¨ Update register file ¤ Control signals determine if a register write is needed ¤ Only one write port is required n Write the ALU result to the destination register, or n Write the loaded data into the register file

Five Stage Pipeline ¨ Ideal pipeline: IPC=1 ¤ Is there enough resources to keep the pipeline stages busy all the time? Inst. Fetch Decode Execute Memory Writeback + + PC ALU Reg. Reg. 4 File File Mem Mem

Pipeline Hazards

Pipeline Hazards ¨ Structural hazards: multiple instructions compete for the same resource ¨ Data hazards: a dependent instruction cannot proceed because it needs a value that hasn’t been produced ¨ Control hazards: the next instruction cannot be fetched because the outcome of an earlier branch is unknown

Structural Hazards ¨ 1. Unified memory for instruction and data R1 ß Mem[R2] R3 ß Mem[R20] R6 ß R4-R5 R7 ß R1+R0

Structural Hazards ¨ 1. Unified memory for instruction and data R1 ß Mem[R2] R3 ß Mem[R20] R6 ß R4-R5 R7 ß R1+R0 Separate inst. and data memories.

Structural Hazards ¨ 1. Unified memory for instruction and data ¨ 2. Register file with shared read/write access ports R1 ß Mem[R2] R3 ß Mem[R20] R6 ß R4-R5 R7 ß R1+R0

Structural Hazards ¨ 1. Unified memory for instruction and data ¨ 2. Register file with shared read/write access ports R1 ß Mem[R2] R3 ß Mem[R20] R6 ß R4-R5 R7 ß R1+R0 Register access in half cycles.

PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant - PowerPoint PPT Presentation

PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Tonight: Homework 1 deadline (11:59PM) n Verify your uploaded files

Pipelining Instruction Pipelining is the use of pipelining to allow more than one instruction to

Overview Basics of Pipelining Pipeline Hazards Appendix A Pipeline Implementation

Appendix A Appendix A Pipelining: Basic and Intermediate Concepts p 1 Overview Basics of

in Big-Data Analytic Systems Rui Li , Peizhen Guo, Bo Hu, Wenjun Hu Yale University Background

Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining L L a a Logic

Appendix A Pipelining: Basic and Intermediate C Concepts t 1 Overview Basics of

Pipelining PIPELINING what Seymour Cray taught the laundry industry How to correctly pipeline

Chapter 3: Pipelining and Parallel Processing Keshab K. Parhi Outline Introduction

Computer Systems Lecture 15 Pipelining and Hazards CS 230 - Spring 2020 3-1 Pipelining CS

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming

Computer Architecture Summer 2020 Pipelining Tyler Bletsch Duke University Includes material

VOLVO PENTA STAGE V SOLUTION Engine concept and range presentation April 2019 ADDITIONAL

Lecture 17: Basic Pipelining Todays topics: 5-stage pipeline Hazards and

Chapter Six 1 2004 Morgan Kaufmann Publishers Pipelining The laundry analogy for

EE 457 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink

Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects

1 Hybrid Estimation (Filt ering) Problem Conditional Dependencies in PHA PHA PHA T 1 > 0: p=

AI Ethics (I): Value- embeddedness in AI system design Group Activity: Moral Machine

CS 378: Autonomous Intelligent Robotics (FRI) Dr. Todd Hester Are there any questions? Talks

Mind Maps Interpersonal Fundamentals of Management Figurehead Decisional Leader

D U N E M C - S a m p l e G e n e r a t i o n G l e b S i n e v D

base This is easy to see when each power is written in expanded power form. 2

Exponent Laws With Numerical Bases MPM1D: Principles of Mathematics Recap 3 5 3 5 9

Representations of Power Series over Word Algebras Ulrich Faigle, Alexander Schnhuth

PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant - PowerPoint PPT Presentation

PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Tonight: Homework 1 deadline (11:59PM) n Verify your uploaded files

Pipelining Instruction Pipelining is the use of pipelining to allow more than one instruction to

Overview Basics of Pipelining Pipeline Hazards Appendix A Pipeline Implementation

Appendix A Appendix A Pipelining: Basic and Intermediate Concepts p 1 Overview Basics of

in Big-Data Analytic Systems Rui Li , Peizhen Guo, Bo Hu, Wenjun Hu Yale University Background

Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining L L a a Logic

Appendix A Pipelining: Basic and Intermediate C Concepts t 1 Overview Basics of

Pipelining PIPELINING what Seymour Cray taught the laundry industry How to correctly pipeline

Chapter 3: Pipelining and Parallel Processing Keshab K. Parhi Outline Introduction

Computer Systems Lecture 15 Pipelining and Hazards CS 230 - Spring 2020 3-1 Pipelining CS

Lecture 2 (I ): Lecture 2 (I ): Pipelining &amp; Retiming Pipelining &amp; Retiming

Computer Architecture Summer 2020 Pipelining Tyler Bletsch Duke University Includes material

VOLVO PENTA STAGE V SOLUTION Engine concept and range presentation April 2019 ADDITIONAL

Lecture 17: Basic Pipelining Todays topics: 5-stage pipeline Hazards and

Chapter Six 1 2004 Morgan Kaufmann Publishers Pipelining The laundry analogy for

EE 457 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink

Retiming &amp; Pipelining over Global Retiming &amp; Pipelining over Global Interconnects

1 Hybrid Estimation (Filt ering) Problem Conditional Dependencies in PHA PHA PHA T 1 &gt; 0: p=

AI Ethics (I): Value- embeddedness in AI system design Group Activity: Moral Machine

CS 378: Autonomous Intelligent Robotics (FRI) Dr. Todd Hester Are there any questions? Talks

Mind Maps Interpersonal Fundamentals of Management Figurehead Decisional Leader

D U N E M C - S a m p l e G e n e r a t i o n G l e b S i n e v D

base This is easy to see when each power is written in expanded power form. 2

Exponent Laws With Numerical Bases MPM1D: Principles of Mathematics Recap 3 5 3 5 9

Representations of Power Series over Word Algebras Ulrich Faigle, Alexander Schnhuth

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming

Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects

1 Hybrid Estimation (Filt ering) Problem Conditional Dependencies in PHA PHA PHA T 1 > 0: p=