Lecture 10: Processor design pipelining Overlapping the execution - PowerPoint PPT Presentation

Lecture 10: Processor design – pipelining � Overlapping the execution of instructions � Pipeline hazards – Different types – How to remove them Inf2C Computer Systems - 2011-2012 1

Pipelining � Classic case: make all instructions take 5 steps. e.g.: l w r 1, n( r 2) # r 1=m em or y[ n+r 2] Step Name Datapath operation Fetch instruction; PC+4 → PC 0 IF 1 REG Get value from r2 2 ALU ALU n+r2 3 MEM Get data from memory 4 WB Write memory data into r1 IF = instruction fetch (includes PC increment) REG = fetching values from general purpose registers ALU = arithmetic/logic operations MEM = memory access WB = write back results to general purpose registers Inf2C Computer Systems - 2011-2012 2

Pipelining � Start one instruction per clock cycle IF REG ALU MEM WB instruction flow IF REG ALU MEM WB IF REG ALU MEM WB IF REG ALU MEM WB IF REG ALU MEM WB cycle 1 2 3 4 5 6 7 8 9 • Five instructions are being executed (in different stages) during the same cycle • Each instruction still takes 5 cycles, but instructions now complete every cycle: CPI → 1 Inf2C Computer Systems - 2011-2012 3

Preparing instructions for pipelining � Stretch the execution to the max number of cycles, e.g. sw r 1, n( r 2) # m em or y[ n+r 2] =r 1 Fetch instruction; PC+4 → PC IF Get values of r1 and r2 from registers REG ALU ALU n+r2 Store value of r1 to memory MEM Do nothing WB add r 1, r 2, r 3 # r 1=r 2+r 3 Fetch instruction; PC+4 → PC IF Get values of r2 and r3 from registers REG ALU r2+r3 ALU Do nothing MEM WB Write result to r1 Inf2C Computer Systems - 2011-2012 4

Execution speedup IF REG ALU MEM WB IF REG ALU MEM WB IF REG ALU MEM WB IF REG ALU MEM WB IF REG ALU MEM WB IF REG ALU MEM WB IF REG ALU MEM WB IF REG ALU MEM WB cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 � Speed-up roughly equal to the number of stages Inf2C Computer Systems - 2011-2012 5

Pipeline hazards � Complications in pipelining, called hazards – Structural – Data – Control � Speedup achieved is limited, CPI over 1 Inf2C Computer Systems - 2011-2012 6

Structural hazards � Example: instructions in IF and MEM stages may conflict for access to memory (cache) = “bubble” IF REG ALU MEM WB l w IF REG ALU MEM WB I 1 I 2 IF REG ALU MEM WB IF REG ALU MEM WB I 3 Inf2C Computer Systems - 2011-2012 7

Structural hazards � Not enough hardware resources to execute a combination of instructions in the same clock cycle � Straightforward solution: use more resources – E.g. split cache into instruction cache (used in IF) and data cache (used in MEM) � Good design – provide enough resources to avoid hazards for common/frequent cases Inf2C Computer Systems - 2011-2012 8

Data hazards � One instruction must use value produced by a previous instruction � Example: add r 2, r 1, r 5 add r 2, r 1, r 5 l w l w r 3 r 3, 4( r 1) , 4( r 1) addi addi r 4, r 4, r 3 r 3, n , n IF REG ALU MEM WB add IF REG ALU MEM WB l w IF addi REG ALU MEM WB IF REG ALU MEM WB 3 cycle stall Inf2C Computer Systems - 2011-2012 9

Data hazards � Processor must detect hazards and insert bubbles � Solution: compiler can separate dependent instructions l w l w r 3 r 3, 4( r 1) , 4( r 1) add r 2, r 1, r 5 add r 2, r 1, r 5 addi addi r 4, r 4, r 3 r 3, n , n IF REG ALU MEM WB l w IF REG ALU MEM WB add IF addi REG ALU MEM WB IF REG ALU MEM WB 2 cycle stall Inf2C Computer Systems - 2011-2012 10

Data forwarding � The data is actually available before the end of WB � Why not forward it directly to the unit/stage where they are needed? IF REG ALU MEM WB add IF REG ALU MEM WB l w IF addi REG ALU MEM WB IF REG ALU MEM WB 1 cycle stall Inf2C Computer Systems - 2011-2012 11

Control hazards � Before a conditional branch instruction is resolved, the processor does not know where to fetch the next instruction from � Example: beq r 1, r 2, n Fetch instruction; PC+4 → PC IF Get values of r1 and r2 from registers REG ALU r1-r2 and PC+n ALU If r1-r2==0 update PC MEM WB Do nothing � Branch is identified in IF but only resolved in MEM Inf2C Computer Systems - 2011-2012 12

Control hazards IF REG ALU MEM WB beq IF REG ALU MEM WB IF REG ALU MEM WB Branch latency Inf2C Computer Systems - 2011-2012 13

Branch prediction � Solution: predict outcome of branch – If prediction correct, bubble is reduced or eliminated – If prediction incorrect, processor must discard (“flush” or “squash”) incorrectly loaded instructions IF REG ALU MEM WB beq IF REG ALU MEM WB IF REG ALU MEM WB IF REG ALU MEM WB Flushed, on misprediction IF REG ALU MEM WB Inf2C Computer Systems - 2011-2012 14

Is this the end? in performance improvement � Superscalar processors: – Can fetch more than 1 instruction per cycle – Have multiple pipelines and ALUs to execute multiple instructions simultaneously � Predicated execution: – Execute simultaneously instructions from both targets of the branch and discard the incorrect one (e.g. IA-64) (against control hazards) � Value prediction: – Predict result of instructions (against data hazards) � Multiprocessors Inf2C Computer Systems - 2011-2012 15

Lecture 10: Processor design pipelining Overlapping the execution - PowerPoint PPT Presentation

Lecture 10: Processor design pipelining Overlapping the execution of instructions Pipeline hazards Different types How to remove them Inf2C Computer Systems - 2011-2012 1 Pipelining Classic case: make all instructions

Pipelining Instruction Pipelining is the use of pipelining to allow more than one instruction to

Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining L L a a Logic

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming

Computer Systems Lecture 15 Pipelining and Hazards CS 230 - Spring 2020 3-1 Pipelining CS

Chapter 3: Pipelining and Parallel Processing Keshab K. Parhi Outline Introduction

Appendix A Appendix A Pipelining: Basic and Intermediate Concepts p 1 Overview Basics of

CIS 371 Computer Organization and Design Unit 5: Pipelining Based on slides by Prof. Amir Roth

Overview General Principles of Pipelining Goal Computer Architecture: Pipelining

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Overview Basics of Pipelining Pipeline Hazards Appendix A Pipeline Implementation

Computer Architecture Summer 2020 Pipelining Tyler Bletsch Duke University Includes material

Appendix A Pipelining: Basic and Intermediate C Concepts t 1 Overview Basics of

Chapter Six 1 2004 Morgan Kaufmann Publishers Pipelining The laundry analogy for

EE 457 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink

Pipelining PIPELINING what Seymour Cray taught the laundry industry How to correctly pipeline

Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects

1 Response Time Det tar 4 mnader att odla fram en tomat How long does it take for my job

Performance Eric McCreath Increasing Word Size A simple way of improving performance is to

CS 35101 Computer Architecture Spring 2008 Week 10: Chapter 5.1-5.3 Materials adapated from

4. Performance Analysis of Parallel Programs 4.1 Performance Evaluation of Computer User

PATMOS 2010 An On-Chip Flip-flop Characterization Circuit Andrea Veggetti (ST Agrate) Abhishek

CPSC 121: Models of Computation Module 9: Sequential Circuits Module 9: Sequential Circuits By

Systems Timing Sequential Circuits Shankar Balachandran* Associate Professor, CSE Department

Lecture 13: Sequential Circuits, FSM Todays topics: Sequential circuits Finite

Lecture 10: Processor design pipelining Overlapping the execution - PowerPoint PPT Presentation

Lecture 10: Processor design pipelining Overlapping the execution of instructions Pipeline hazards Different types How to remove them Inf2C Computer Systems - 2011-2012 1 Pipelining Classic case: make all instructions

Pipelining Instruction Pipelining is the use of pipelining to allow more than one instruction to

Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining L L a a Logic

Lecture 2 (I ): Lecture 2 (I ): Pipelining &amp; Retiming Pipelining &amp; Retiming

Computer Systems Lecture 15 Pipelining and Hazards CS 230 - Spring 2020 3-1 Pipelining CS

Chapter 3: Pipelining and Parallel Processing Keshab K. Parhi Outline Introduction

Appendix A Appendix A Pipelining: Basic and Intermediate Concepts p 1 Overview Basics of

CIS 371 Computer Organization and Design Unit 5: Pipelining Based on slides by Prof. Amir Roth

Overview General Principles of Pipelining Goal Computer Architecture: Pipelining

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Overview Basics of Pipelining Pipeline Hazards Appendix A Pipeline Implementation

Computer Architecture Summer 2020 Pipelining Tyler Bletsch Duke University Includes material

Appendix A Pipelining: Basic and Intermediate C Concepts t 1 Overview Basics of

Chapter Six 1 2004 Morgan Kaufmann Publishers Pipelining The laundry analogy for

EE 457 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink

Pipelining PIPELINING what Seymour Cray taught the laundry industry How to correctly pipeline

Retiming &amp; Pipelining over Global Retiming &amp; Pipelining over Global Interconnects

1 Response Time Det tar 4 mnader att odla fram en tomat How long does it take for my job

Performance Eric McCreath Increasing Word Size A simple way of improving performance is to

CS 35101 Computer Architecture Spring 2008 Week 10: Chapter 5.1-5.3 Materials adapated from

4. Performance Analysis of Parallel Programs 4.1 Performance Evaluation of Computer User

PATMOS 2010 An On-Chip Flip-flop Characterization Circuit Andrea Veggetti (ST Agrate) Abhishek

CPSC 121: Models of Computation Module 9: Sequential Circuits Module 9: Sequential Circuits By

Systems Timing Sequential Circuits Shankar Balachandran* Associate Professor, CSE Department

Lecture 13: Sequential Circuits, FSM Todays topics: Sequential circuits Finite

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming

Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects