Pipelining Performance Measurements Cycle Time: Time in between - PowerPoint PPT Presentation

MEM IF ID WB or $s3, $s4, $t3 sw $s2, 0($t1) lw $s1, 0($t0) IF ID MEM WB add $s0, $0, $0 IF ID MEM WB lw $s1, 0($t0) sw $s2, 0($t1) IF ID MEM or $s3, $s4, $t3 IF ID 1 2 3 4 5 6 7 8 Time->

MEM IF ID WB or $s3, $s4, $t3 sw $s2, 0($t1) IF ID MEM WB add $s0, $0, $0 IF ID MEM WB lw $s1, 0($t0) sw $s2, 0($t1) IF ID MEM WB or $s3, $s4, $t3 IF ID MEM 1 2 3 4 5 6 7 8 Time->

MEM IF ID WB or $s3, $s4, $t3 IF ID MEM WB add $s0, $0, $0 IF ID MEM WB lw $s1, 0($t0) sw $s2, 0($t1) IF ID MEM WB or $s3, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

MEM IF ID WB The machine in cycle 4 IF ID MEM WB add $s0, $0, $0 IF ID MEM WB lw $s1, 0($t0) sw $s2, 0($t1) IF ID MEM WB or $s3, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

MEM IF ID WB The machine in cycle 5 IF ID MEM WB add $s0, $0, $0 IF ID MEM WB lw $s1, 0($t0) sw $s2, 0($t1) IF ID MEM WB or $s3, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

In what cycle was $s1 written? In what cycle was $s4 read? In what cycle was the Add executed? IF ID MEM WB add $s0, $0, $0 IF ID MEM WB lw $s1, 0($t0) sw $s2, 0($t1) IF ID MEM WB or $s3, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

In what cycle was $s1 written? 6 In what cycle was $s4 read? In what cycle was the Add executed? IF ID MEM WB add $s0, $0, $0 IF ID MEM WB lw $s1 , 0($t0) sw $s2, 0($t1) IF ID MEM WB or $s3, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

In what cycle was $s1 written? 6 In what cycle was $s4 read? 5 In what cycle was the Add executed? IF ID MEM WB add $s0, $0, $0 IF ID MEM WB lw $s1, 0($t0) sw $s2, 0($t1) IF ID MEM WB or $s3, $s4 , $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

In what cycle was $s1 written? 6 In what cycle was $s4 read? 5 In what cycle was the Add executed? 3 IF ID MEM WB add $s0, $0, $0 IF ID MEM WB lw $s1, 0($t0) sw $s2, 0($t1) IF ID MEM WB or $s3, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Performance Analysis • Measurements related to our machine • Job = single instruction • Latency: Time to finish a complete _______________, start to finish. • Throughput: Average ______________ completed per unit time. • Which is more important for reducing program execution time?

Performance Analysis • Measurements related to our machine • Job = single instruction • Latency: Time to finish a complete instruction start to finish. • Throughput: Average ______________ completed per unit time. • Which is more important for reducing program execution time?

Performance Analysis • Measurements related to our machine • Job = single instruction • Latency: Time to finish a complete instruction start to finish. • Throughput: Average number of instructions completed per unit time. • Which is more important for reducing program execution time?

Pipelined Machine Decode Execute Memory Fetch << << 4 2 2 Addr Out Data src1 src1data op/fun PC Read Addr Out Data rs Data Memory src2 src2data Instruction rt Register File Memory rd destreg imm In Data destdata Sign 16 32 Ext (Writeback) Pipeline Register

Pipeline Registers • Named for two stages they separate • Store all data corresponding to lines that go through them w IF/ID w EX/MEM § 32b instruction § Zero § 32b nPC § 32b ALU result § 32b nPC w ID/EX § 32b register value § 32b register w MEM/WB § 32b register § 32b immediate field § 32b ALU result § 32b nPC § 32b memory value

Register File • Only takes half of a cycle to read or write to register file • Convention: w Read 2nd half of cycle w Write 1st half of cycle

Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay Single-Cycle Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns Pipelined Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns

Machine Comparison FetchDecode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns Pipelined Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns

Machine Comparison FetchDecode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: _____ inst/ns Pipelined Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns

Machine Comparison FetchDecode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: 1/8 inst/ns Pipelined Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns

Machine Comparison FetchDecode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: 1/8 inst/ns Pipelined Implementation Clock cycle time: 2.1 ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns

Machine Comparison FetchDecode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: 1/8 inst/ns Pipelined Implementation Clock cycle time: 2.1 ns Latency of a single instruction: 2.1*5=10.5 ns Throughput for machine: _____ inst/ns

Machine Comparison FetchDecode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: 1/8 inst/ns Pipelined Implementation Clock cycle time: 2.1 ns Latency of a single instruction: 2.1*5=10.5 ns Throughput for machine: 1 / 2.1 inst/ns

Example 2 – How do we speed up pipelined machine? Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns

Example 2 – How do we speed up pipelined machine? Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Single cycle: 1 / 32 inst / ns Pipelined: 1 / 10.1 inst / ns

Example 2 – Split more stages Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Which stage(s) should we split? _________ and _________

Example 2 – Split more stages Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Which stage(s) should we split? Memory and _________

Example 2 – Split more stages Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Which stage(s) should we split? Memory and Execute

Example 2 – After Split F D X1 X2 M1 M2 WB ___ns ___ns ___ns ___ns ___ns ___ns ___ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns

Example 2 – After Split F D X1 X2 M1 M2 WB 6 ns 4 ns ___ns ___ns ___ns ___ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns

Example 2 – After Split F D X1 X2 M1 M2 WB 6 ns 4 ns 4 ns 4 ns ___ns ___ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns

Example 2 – After Split F D X1 X2 M1 M2 WB 6 ns 4 ns 4 ns 4 ns 5 ns 5 ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns

Example 2 – After Split F D X1 X2 M1 M2 WB 6 ns 4 ns 4 ns 4 ns 5 ns 5 ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / 32 ns Pipelined: 1 / ns

Example 2 – After Split F D X1 X2 M1 M2 WB 6 ns 4 ns 4 ns 4 ns 5 ns 5 ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / 32 ns Pipelined: 1 / 6.1 ns

Incorrect Execution Easy Right? Not so fast. In what cycle does the add write $s0? In what cycle does the or read $s0? IF ID MEM WB add $s0, $0, $0 IF ID MEM WB or $s3, $s0, $t3 sw $s2, 0($t1) IF ID MEM WB and $s6, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Easy Right? Not so fast. In what cycle does the add write $s0? 1 st half of cycle 5 In what cycle does the or read $s0? IF ID MEM WB add $s0, $0, $0 IF ID MEM WB or $s3, $s0, $t3 sw $s2, 0($t1) IF ID MEM WB and $s6, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Easy Right? Not so fast. In what cycle does the add write $s0 1 st half of cycle 5 In what cycle does the or read $s0? 2 nd half of cycle 3 IF ID MEM WB WB add $s0, $0, $0 IF ID MEM WB or $s3, $s0, $t3 sw $s2, 0($t1) IF ID MEM WB and $s6, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Easy Right? Not so fast. Ahhhh! Values can not pass backwards in time In what cycle does the add write $s0? 1 st half of cycle 5 In what cycle does the or read $s0? 2 nd half of cycle 3 IF ID MEM WB WB add $s0, $0, $0 IF ID MEM WB or $s3, $s0, $t3 sw $s2, 0($t1) IF ID MEM WB and $s6, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Correct, Slow Execution Easy Right? Not so fast. In what cycle does the add write $s0? 1 st half of cycle 5 In what cycle does the or read $s0? 2 nd half of cycle 5 Stall - wasted cycles IF ID MEM WB add $s0, $0, $0 IF ID MEM WB or $s3, $s0, $t3 IF IF sw $s2, 0($t1) IF ID MEM WB and $s6, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Only Register File rd/wr in half a cycle. All Correct, Slow Execution other stages take a full cycle – this is because of shared hardware Easy Right? Not so fast. In what cycle does the add write $s0? 1 st half of cycle 5 In what cycle does the or read $s0? 2 nd half of cycle 5 Stall - wasted cycles IF ID MEM WB add $s0, $0, $0 IF ID MEM WB or $s3, $s0, $t3 IF IF sw $s2, 0($t1) IF ID MEM WB and $s6, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Barriers to pipelined performance • Uneven stages • Pipeline register delays

Barriers to pipelined performance • Uneven stages • Pipeline register delays • Data Hazards

Barriers to pipeline performance • Uneven stages • Pipeline register delays • Data Hazards w An instruction depends on the result of a previous instruction still in the pipeline

Solutions? • What can we try to reduce data hazards or their effect?

Default (do nothing): Stall Easy Right? Not so fast. In what cycle does the add write $s0? 1 st half of cycle 5 In what cycle does the or read $s0? 2 nd half of cycle 5 Stall - wasted cycles IF ID MEM WB add $s0, $0, $0 IF ID MEM WB or $s3, $s0, $t3 IF IF sw $s2, 0($t1) IF ID MEM WB and $s6, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Solution 1: Data Forwarding In what cycle is $s0 calculated in the machine? In what cycle is $s0 used in the machine? IF ID MEM WB lw $s0, 0($t4) IF ID MEM WB or $s3, $s0, $t3 sw $s2, 0($t1) IF ID MEM WB and $s6, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Solution 1: Data Forwarding In what cycle is $s0 calculated in the machine? End of cycle 4 In what cycle is $s0 used ? IF ID MEM WB lw $s0, 0($t4) IF ID MEM WB or $s3, $s0, $t3 sw $s2, 0($t1) IF ID MEM WB and $s6, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Solution 1: Data Forwarding In what cycle is $s0 calculated in the machine? End of cycle 4 In what cycle is $s0 used ? beginning of cycle 4 IF ID MEM WB lw $s0, 0($t4) IF ID MEM WB or $s3, $s0, $t3 sw $s2, 0($t1) IF ID MEM WB and $s6, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Solution 1: Data Forwarding In what cycle is $s0 calculated in the machine? end of cycle 4 In what cycle is $s0 used ? beginning of cycle 5 IF ID MEM WB lw $s0, 0($t4) IF ID MEM WB or $s3, $s0, $t3 ID sw $s2, 0($t1) IF ID MEM WB IF and $s6, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Data-Forwarding Where are those wires? Decode Execute Memory Fetch << << 4 2 2 Addr Out Data src1 src1data op/fun PC Read Addr Out Data rs Data Memory src2 src2data Instruction rt Register File Memory rd destreg imm In Data destdata Sign 16 32 Ext (Writeback) Pipeline Register

Data Forwarding Example 2 Draw the timing diagram with data forwarding Draw arrows to indicate data passing through forwarding lw $t0, 0($s0) F D M W addi $t0, $t0, 1 F D add $s2, $s2, $t0 F sw $s2, 0($s0) 1 2 3 4 5 6 7 8 9 10 11 12 Time->

Solution 2: Instruction Reordering (Before reordering) Stall - wasted cycles IF ID MEM WB lw $s0, 0($t4) IF MEM WB or $s3, $s0, $t3 IF IF ID sw $s2, 0($t1) IF ID MEM WB and $s6, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Solution 2: Instruction Reordering (After Reordering) IF ID MEM WB WB lw $s0, 0($t4) IF ID MEM WB sw $s2, 0($t1) and $s6, $s4, $t3 IF ID MEM WB or $s3, $s0, $t3 IF ID ID MEM WB 1 2 3 4 5 6 7 8 Time->

Who reorders instructions? • Static scheduling w Compiler w Simpler, but does not know when caches miss or loads/stores are to the same locations • Dynamic scheduling w Hardware w More complicated, but has all knowledge

Solution 2: Instruction Reordering IF ID MEM WB WB lw $s0, 0($t4) IF ID ID MEM WB or $s3, $s0, $t3 sw $s3, 0($t1) IF ID MEM WB and $s0, $s4, $t3 IF ID MEM WB 1 2 3 4 5 6 7 8 Time->

Pipelining Performance Measurements Cycle Time: Time in between - PowerPoint PPT Presentation

Pipelining Performance Measurements Cycle Time: Time in between clock ticks Latency: Time to finish a complete job, start to finish Throughput: Average jobs completed per unit time CyclesPerJob: Number of cycles between

Pipelining Instruction Pipelining is the use of pipelining to allow more than one instruction to

Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining L L a a Logic

Chapter 3: Pipelining and Parallel Processing Keshab K. Parhi Outline Introduction

Appendix A Appendix A Pipelining: Basic and Intermediate Concepts p 1 Overview Basics of

Computer Systems Lecture 15 Pipelining and Hazards CS 230 - Spring 2020 3-1 Pipelining CS

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming

Chapter Six 1 2004 Morgan Kaufmann Publishers Pipelining The laundry analogy for

Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by

CIS 371 Computer Organization and Design Unit 5: Pipelining Based on slides by Prof. Amir Roth

Overview Basics of Pipelining Pipeline Hazards Appendix A Pipeline Implementation

Computer Architecture Summer 2020 Pipelining Tyler Bletsch Duke University Includes material

Appendix A Pipelining: Basic and Intermediate C Concepts t 1 Overview Basics of

EE 457 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink

Pipelining PIPELINING what Seymour Cray taught the laundry industry How to correctly pipeline

Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects

Overview General Principles of Pipelining Goal Computer Architecture: Pipelining

Chapter 5 Managing Process Constraints Theory of Constraints Managing Bottlenecks

CS4617 Computer Architecture Lecture 4: Memory Hierarchy 2 Dr J Vaughan September 17, 2014 1/25

CSEE 3827: Fundamentals of Computer Systems, Spring 2011 8. Processor Performance Prof. Martha

Computer Systems Lecture 14 Performance Measures CS 230 - Spring 2020 3-1 CPU Clocking

CSSE232 Computer Architecture Performance Class status

Technology Insertion/Infusion CALCE Electronic Products and Systems Center University of Maryland

1 CPI (cycles per instruction) CPI (cycles per instruction) Parallelism to save time

LECTURE 1 Introduction CLASSES OF COMPUTERS When we think of a computer, most of us

Pipelining Performance Measurements Cycle Time: Time in between - PowerPoint PPT Presentation

Pipelining Performance Measurements Cycle Time: Time in between clock ticks Latency: Time to finish a complete job, start to finish Throughput: Average jobs completed per unit time CyclesPerJob: Number of cycles between

Pipelining Instruction Pipelining is the use of pipelining to allow more than one instruction to

Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining L L a a Logic

Chapter 3: Pipelining and Parallel Processing Keshab K. Parhi Outline Introduction

Appendix A Appendix A Pipelining: Basic and Intermediate Concepts p 1 Overview Basics of

Computer Systems Lecture 15 Pipelining and Hazards CS 230 - Spring 2020 3-1 Pipelining CS

Lecture 2 (I ): Lecture 2 (I ): Pipelining &amp; Retiming Pipelining &amp; Retiming

Chapter Six 1 2004 Morgan Kaufmann Publishers Pipelining The laundry analogy for

Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by

CIS 371 Computer Organization and Design Unit 5: Pipelining Based on slides by Prof. Amir Roth

Overview Basics of Pipelining Pipeline Hazards Appendix A Pipeline Implementation

Computer Architecture Summer 2020 Pipelining Tyler Bletsch Duke University Includes material

Appendix A Pipelining: Basic and Intermediate C Concepts t 1 Overview Basics of

EE 457 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink

Pipelining PIPELINING what Seymour Cray taught the laundry industry How to correctly pipeline

Retiming &amp; Pipelining over Global Retiming &amp; Pipelining over Global Interconnects

Overview General Principles of Pipelining Goal Computer Architecture: Pipelining

Chapter 5 Managing Process Constraints Theory of Constraints Managing Bottlenecks

CS4617 Computer Architecture Lecture 4: Memory Hierarchy 2 Dr J Vaughan September 17, 2014 1/25

CSEE 3827: Fundamentals of Computer Systems, Spring 2011 8. Processor Performance Prof. Martha

Computer Systems Lecture 14 Performance Measures CS 230 - Spring 2020 3-1 CPU Clocking

CSSE232 Computer Architecture Performance Class status

Technology Insertion/Infusion CALCE Electronic Products and Systems Center University of Maryland

1 CPI (cycles per instruction) CPI (cycles per instruction) Parallelism to save time

LECTURE 1 Introduction CLASSES OF COMPUTERS When we think of a computer, most of us

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming

Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects