Pipelining Dr. Soner Onder CS 4431 Michigan Technological - PowerPoint PPT Presentation

Lecture – 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/28/2020 1

A "Typical" RISC ISA  32-bit fixed format instruction (3 formats)  32 32-bit GPR (R0 contains zero, DP take pair)  3-address, reg-reg arithmetic instruction  Single address mode for load/store: base + displacement  no indirection  Simple branch conditions  Delayed branch see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3 9/28/2020 2

Example: MIPS ( MIPS) Register-Register 6 5 11 10 31 26 25 21 20 16 15 0 Op Rs1 Rs2 Rd Opx Register-Immediate 31 26 25 21 20 16 15 0 immediate Op Rs1 Rd Branch 31 26 25 21 20 16 15 0 immediate Op Rs1 Rs2/Opx Jump / Call 31 26 25 0 target Op 9/28/2020 3

Datapath vs Control Datapath Controller signals Control Points Datapath: Storage, FU, interconnect sufficient to perform the desired  functions  Inputs are Control Points  Outputs are signals Controller: State machine to orchestrate operation on the data path   Based on desired function and signals 9/28/2020 4

Approaching an ISA  Instruction Set Architecture  Defines set of operations, instruction format, hardware supported data types, named storage, addressing modes, sequencing  Meaning of each instruction is described by RTL on architected registers and memory  Given technology constraints assemble adequate datapath  Architected storage mapped to actual storage  Function units to do all the required operations  Possible additional storage (eg. MAR, MBR, …)  Interconnect to move information among regs and FUs  Map each instruction to sequence of RTLs  Collate sequences into symbolic controller state transition diagram (STD)  Lower symbolic STD to control points  Implement controller 9/28/2020 5

Pipelining: Its Natural!  Laundry Example  Ann, Brian, Cathy, Dave each have one load of clothes A B C D to wash, dry, and fold  Washer takes 30 minutes  Dryer takes 40 minutes  “Folder” takes 20 minutes 9/28/2020 6

Sequential Laundry 6 PM Midnight 7 8 9 11 10 Time 30 40 20 30 40 20 30 40 20 30 40 20 T a A s k O r B d e r C D Sequential laundry takes 6 hours for 4 loads  If they learned pipelining, how long would laundry take?  9/28/2020 7

Pipelined Laundry Start work ASAP 6 PM Midnight 7 8 9 11 10 Time 30 40 40 40 40 20 T a A s k O r B d e r C D Pipelined laundry takes 3.5 hours for 4 loads  9/28/2020 8

Pipelining Lessons  Pipelining doesn’t help latency 6 PM 7 8 9 of single task, it helps throughput of entire workload Time  Pipeline rate limited by slowest T 30 40 40 40 40 20 pipeline stage a s  Multiple tasks operating k A simultaneously O  Potential speedup = Number r d pipe stages e B r  Unbalanced lengths of pipe stages reduces speedup  Time to “fill” pipeline and time C to “drain” it reduces speedup D 9/28/2020 9

5 Steps of MIPS Datapath Figure A.2, Page A-8 Instruction Instr. Decode Execute Memory Write Fetch Reg. Fetch Addr. Calc Access Back Next PC MUX Next SEQ PC Adder 4 Zero? RS1 MUX RS2 Address Memory Reg File Inst ALU Memory L RD MUX Data M MUX D Sign IR <= mem[PC]; Imm Extend PC <= PC + 4 WB Data Reg[IR rd ] <= Reg[IR rs ] op IRop Reg[IR rt ] 9/28/2020 10

5 Steps of MIPS Datapath Figure A.3, Page A-9 Instruction Instr. Decode Execute Memory Write Fetch Addr. Calc Access Reg. Fetch Back Next PC MUX Next SEQ PC Next SEQ PC Adder 4 Zero? RS1 RS2 MUX Address Memory MEM/WB Reg File EX/MEM ID/EX IF/ID ALU Memory MUX Data MUX IR <= mem[PC]; Sign WB Data Extend PC <= PC + 4 Imm RD RD RD A <= Reg[IR rs ]; B <= Reg[IR rt ] rslt <= A op IRop B WB <= rslt 9/28/2020 11 Reg[IR rd ] <= WB

Inst. Set Processor Controller IR <= mem[PC]; Ifetch PC <= PC + 4 opFetch-DCD A <= Reg[IR rs ]; JSR JR ST B <= Reg[IR rt ] jmp br LD RI RR r <= A + IR im r <= A op IRop IR im if bop(A,b) PC <= IR jaddr r <= A op IRop B PC <= PC+IR im WB <= r WB <= r WB <= Mem[r] Reg[IR rd ] <= WB Reg[IR rd ] <= WB Reg[IR rd ] <= WB 9/28/2020 12

5 Steps of MIPS Datapath Figure A.3, Page A-9 Instruction Instr. Decode Execute Memory Write Fetch Reg. Fetch Addr. Calc Access Back Next PC MUX Next SEQ PC Next SEQ PC Adder 4 Zero? RS1 MUX MUX MEM/WB Address Memory RS2 EX/MEM Reg File ID/EX IF/ID ALU Memory Data MUX WB Data Sign Extend Imm RD RD RD • Data stationary control – local decode for each instruction phase / pipeline stage 9/28/2020 13

Visualizing Pipelining Figure A.2, Page A-8 Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I ALU n Reg Ifetch Reg DMem s t r. ALU Reg Ifetch Reg DMem O r ALU Reg Ifetch Reg DMem d e r ALU Reg Ifetch Reg DMem 9/28/2020 14

Pipelining is not quite that easy!  Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle  Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away)  Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)  Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps). 9/28/2020 15

One Memory Port/Structural Hazards Figure A.4, Page A-14 Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I ALU Load Reg Reg Ifetch DMem n s Instr 1 ALU t Reg Reg Ifetch DMem r. ALU Instr 2 Reg Reg Ifetch DMem O r d ALU Instr 3 Reg Reg Ifetch DMem e r ALU Instr 4 Reg Ifetch Reg DMem 9/28/2020 16

One Memory Port/Structural Hazards (Similar to Figure A.5, Page A-15) Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I Load ALU Reg Ifetch Reg DMem n s t Instr 1 ALU Reg Ifetch Reg DMem r. O ALU Instr 2 Reg Ifetch Reg DMem r d e Stall Bubble Bubble Bubble Bubble Bubble r Instr 3 ALU Reg Reg Ifetch DMem How do you “bubble” the pipe? 9/28/2020 17

Data Hazard on R1 Figure A.6, Page A-17 Time (clock cycles) IF ID/RF EX MEM WB I add r1,r2,r3 ALU Reg Reg Ifetch DMem n s t r. ALU Reg sub r4,r1,r3 Reg Ifetch DMem O r ALU and r6,r1,r7 Reg Reg Ifetch DMem d e r ALU Reg or r8,r1,r9 Reg Ifetch DMem xor r10,r1,r11 ALU Reg Reg Ifetch DMem 9/28/2020 18

Dependences and hazards Dependences are a program property:  If two instructions are data dependent they cannot execute  simultaneously. Existence of control-dependences means serialization.  Whether a dependence results in a hazard and whether that hazard  actually causes a stall are properties of the pipeline organization. Data dependences may occur through registers or memory.  9/28/2020 19

Dependences and hazards The presence of the dependence indicates the potential for a  hazard, but the actual hazard and the length of any stall is a property of the pipeline. A data dependence: Indicates that there is a possibility of a hazard.  Determines the order in which results must be calculated, and  Sets an upper bound on the amount of parallelism that can be exploited.  9/28/2020 20

Dependencies Name dependencies Output dependence Anti-dependence Data True dependence Control 9/28/2020 21

Data dependences Data dependence, true dependence, and true data dependence are  terms used to mean the same thing : An instruction j is data dependent on instruction i if either of the  following holds: instruction i produces a result that may be used by instruction j, or  instruction j is data dependent on instruction k, and instruction k is data  dependent on instruction i. Chains of dependent instructions.  9/28/2020 22

Name dependences Output dependence :  When instruction I and j write the same register or memory location. The  ordering must be preserved to leave the correct value in the register: add r7,r4,r3  div r7,r2,r8  Antidependence :  When instruction j writes a register or memory location that instruction i  reads : i: add r6,r5,r4  j: sub r5,r8,r11  9/28/2020 23

Data Dependences through registers/memory Dependences through registers are easy :  lw r10,10(r11)  add r12,r10,r8  just compare register names.  Dependences through memory are harder :  sw r10,4 (r2)  lw r6,0(r4)  is r2+4 = r4+0 ? If so they are dependent, if not, they are not.  9/28/2020 24

Control dependences An instruction j is control dependent on i if the execution of j is  controlled by instruction i. I: If a < b j: a=a+1; j is control dependent on I. 1. An instruction that is control dependent on a branch cannot be  moved before the branch so that its execution is no longer controlled by the branch. 2. An instruction that is not control dependent on a branch cannot be  moved after the branch so that its execution is controlled by the branch. 9/28/2020 25

Pipelining Dr. Soner Onder CS 4431 Michigan Technological - PowerPoint PPT Presentation

Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/28/2020 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair) 3-address,

Pipelining Instruction Pipelining is the use of pipelining to allow more than one instruction to

Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining L L a a Logic

Chapter 3: Pipelining and Parallel Processing Keshab K. Parhi Outline Introduction

Appendix A Appendix A Pipelining: Basic and Intermediate Concepts p 1 Overview Basics of

Computer Systems Lecture 15 Pipelining and Hazards CS 230 - Spring 2020 3-1 Pipelining CS

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming

Overview Basics of Pipelining Pipeline Hazards Appendix A Pipeline Implementation

Computer Architecture Summer 2020 Pipelining Tyler Bletsch Duke University Includes material

Appendix A Pipelining: Basic and Intermediate C Concepts t 1 Overview Basics of

Chapter Six 1 2004 Morgan Kaufmann Publishers Pipelining The laundry analogy for

EE 457 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink

Pipelining PIPELINING what Seymour Cray taught the laundry industry How to correctly pipeline

Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects

Overview General Principles of Pipelining Goal Computer Architecture: Pipelining

Pipelining Philipp Koehn 7 October 2019 Philipp Koehn Computer Systems Fundamentals: Pipelining

Pipelining is Hazardous! Hazards are situations where pipelining does not work as elegantly as

B EYOND 'S TANDARD 'M ODEL ' AT 'LHC

Extraction of Efficient Instruction Schedulers from Cycle-true Processor Models Oliver Wahlen,

Building Custom RISC-V SoCs in Chipyard Abraham Gonzalez UC Berkeley abe.gonzalez@berkeley.edu

MachineArchitecture CS217 Fall2001 1 ComputerOrganization MBus CPU R Control e

Type Systems 3. Labeled Variants 4. Lists Lecture 4 Nov. 10th, 2004 5. Normalization

CS654 Advanced Computer Architecture Lec 5 Performance + Pipeline Review Peter Kemper

{ name: "MongoDB", tags: [ "agile", "scalable", "noSQL",

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Pipelining Dr. Soner Onder CS 4431 Michigan Technological - PowerPoint PPT Presentation

Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/28/2020 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair) 3-address,

Pipelining Instruction Pipelining is the use of pipelining to allow more than one instruction to

Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining L L a a Logic

Chapter 3: Pipelining and Parallel Processing Keshab K. Parhi Outline Introduction

Appendix A Appendix A Pipelining: Basic and Intermediate Concepts p 1 Overview Basics of

Computer Systems Lecture 15 Pipelining and Hazards CS 230 - Spring 2020 3-1 Pipelining CS

Lecture 2 (I ): Lecture 2 (I ): Pipelining &amp; Retiming Pipelining &amp; Retiming

Overview Basics of Pipelining Pipeline Hazards Appendix A Pipeline Implementation

Computer Architecture Summer 2020 Pipelining Tyler Bletsch Duke University Includes material

Appendix A Pipelining: Basic and Intermediate C Concepts t 1 Overview Basics of

Chapter Six 1 2004 Morgan Kaufmann Publishers Pipelining The laundry analogy for

EE 457 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink

Pipelining PIPELINING what Seymour Cray taught the laundry industry How to correctly pipeline

Retiming &amp; Pipelining over Global Retiming &amp; Pipelining over Global Interconnects

Overview General Principles of Pipelining Goal Computer Architecture: Pipelining

Pipelining Philipp Koehn 7 October 2019 Philipp Koehn Computer Systems Fundamentals: Pipelining

Pipelining is Hazardous! Hazards are situations where pipelining does not work as elegantly as

B EYOND 'S TANDARD 'M ODEL ' AT 'LHC

Extraction of Efficient Instruction Schedulers from Cycle-true Processor Models Oliver Wahlen,

Building Custom RISC-V SoCs in Chipyard Abraham Gonzalez UC Berkeley abe.gonzalez@berkeley.edu

MachineArchitecture CS217 Fall2001 1 ComputerOrganization MBus CPU R Control e

Type Systems 3. Labeled Variants 4. Lists Lecture 4 Nov. 10th, 2004 5. Normalization

CS654 Advanced Computer Architecture Lec 5 Performance + Pipeline Review Peter Kemper

{ name: &quot;MongoDB&quot;, tags: [ &quot;agile&quot;, &quot;scalable&quot;, &quot;noSQL&quot;,

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming

Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects

{ name: "MongoDB", tags: [ "agile", "scalable", "noSQL",