INSTRUCTION LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant - PowerPoint PPT Presentation

INSTRUCTION LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture

Overview ¨ Announcement ¤ Tonight: release HW2 (due 11:59PM, Sept. 18) n Note: late submission = no submission n One of your lowest assignment scores will be dropped J ¨ This lecture ¤ Recap multicycle ¤ Impacts of data dependence ¤ Pipeline performance ¤ Instruction level parallelism

Multicycle Instructions ¨ Data hazards ¤ more read-after-write hazards load f4, 0(r2) mul f0, f4, f6 add f2, f0, f8 store f2, 0(r2)

Multicycle Instructions ¨ Data hazards ¤ more read-after-write hazards load f4, 0(r2) IF ID EX MAWB mul f0, f4, f6 IF ID M1 M2 M3 M4 M5 M6 M7 MAWB add f2, f0, f8 IF ID A1 A2 A3 A4 MAWB store f2, 0(r2) IF ID EX MAWB

Multicycle Instructions ¨ Data hazards ¤ potential write-after-write hazards load f4, 0(r2) mul f2, f4, f6 add f2, f0, f8 store f2, 0(r2)

Multicycle Instructions ¨ Data hazards ¤ potential write-after-write hazards load f4, 0(r2) IF ID EX MAWB mul f2, f4, f6 IF ID M1 M2 M3 M4 M5 M6 M7 MAWB add f2, f0, f8 IF ID A1 A2 A3 A4 MAWB store f2, 0(r2) IF ID EX MAWB

Multicycle Instructions ¨ Data hazards ¤ potential write-after-write hazards load f4, 0(r2) IF ID EX MAWB mul f2, f4, f6 IF ID M1 M2 M3 M4 M5 M6 M7 MAWB Out of Order add f2, f0, f8 IF ID A1 A2 A3 A4 MAWB Write-back!! store f2, 0(r2) IF ID EX MAWB

Multicycle Instructions ¨ Data hazards ¤ potential write-after-write hazards load f4, 0(r2) IF ID EX MAWB mul f2, f4, f6 IF ID M1 M2 M3 M4 M5 M6 M7 MAWB In-Order add f2, f0, f8 IF ID A1 A2 A3 A4 MAWB Writes store f2, 0(r2) IF ID EX MAWB

Multicycle Instructions ¨ Imprecise exception ¤ instructions do not necessarily complete in program order load f4, 0(r2) mul f2, f4, f6 add f3, f0, f8 store f2, 0(r2)

Multicycle Instructions ¨ Imprecise exception ¤ instructions do not necessarily complete in program order load f4, 0(r2) IF ID EX MAWB mul f2, f4, f6 IF ID M1 M2 M3 M4 M5 M6 M7 MAWB add f3, f0, f8 IF ID A1 A2 A3 A4 MAWB store f2, 0(r2) IF ID EX MAWB

Multicycle Instructions ¨ Imprecise exception ¤ instructions do not necessarily complete in program order load f4, 0(r2) IF ID EX MAWB mul f2, f4, f6 Overflow!! IF ID M1 M2 M3 M4 M5 M6 M7 MAWB add f3, f0, f8 IF ID A1 A2 A3 A4 MAWB store f2, 0(r2) IF ID EX MAWB

Multicycle Instructions ¨ Imprecise exception ¤ state of the processor must be kept updated with respect to the program order load f4, 0(r2) IF ID EX MAWB mul f2, f4, f6 IF ID M1 M2 M3 M4 M5 M6 M7 MAWB add f3, f0, f8 IF ID A1 A2 A3 A4 MAWB store f2, 0(r2) IF ID EX MAWB In-order register file updates

Reorder Buffer ¨ Multicycle Instructions mul f2, f4, f6 add f4, f0, f1 sub f6, f3, f7 Ints. Dest.

Reorder Buffer ¨ Multicycle Instructions mul f2, f4, f6 add f4, f0, f1 sub f6, f3, f7 Ints. Dest. mul f2 add f4 sub f6

Data Dependence ¨ Point of production ¤ The pipeline stage where an instruction produces a value that can be used by its following instructions PoP Ints. 1: producer time

Data Dependence ¨ Point of production ¤ The pipeline stage where an instruction produces a value that can be used by its following instructions ¨ Point of consumption ¤ The pipeline stage where an instruction consumes a produced data PoC PoP Ints. 1: producer Inst. 2: consumer time

Problem ¨ Consider a 10-stage pipeline processor, where point of production and point of consumption are separated by 4 cycles. Assume that half the instructions do not introduce a data hazard and half the instructions depend on their preceding instruction. What is the maximum attainable IPC?

Problem ¨ Consider a 10-stage pipeline processor, where point of production and point of consumption are separated by 4 cycles. Assume that half the instructions do not introduce a data hazard and half the instructions depend on their preceding instruction. What is the maximum attainable IPC? Stall Cycles … Instructions

Problem ¨ Consider a 10-stage pipeline processor, where point of production and point of consumption are separated by 4 cycles. Assume that half the instructions do not introduce a data hazard and half the instructions depend on their preceding instruction. What is the maximum attainable IPC? Stall Cycles 2 … IPC = = 0.4 5 Instructions

Performance vs. Pipeline Depth ¨ Impact of stall cycles on performance ¤ Independent instructions ¤ Dependent instructions No Stalls Performance Pipeline Depth (number of stages)

Performance vs. Pipeline Depth ¨ Impact of stall cycles on performance ¤ Independent instructions ¤ Dependent instructions No Stalls 1 𝑚𝑏𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧 Performance Pipeline Depth (number of stages)

Performance vs. Pipeline Depth ¨ Impact of stall cycles on performance ¤ Independent instructions ¤ Dependent instructions No Stalls Fully Stalled 1 𝑚𝑏𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧 Performance Pipeline Depth (number of stages)

Performance vs. Pipeline Depth ¨ Impact of stall cycles on performance ¤ Independent instructions ¤ Dependent instructions No Stalls Fully Stalled Average 1 𝑚𝑏𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧 Performance Pipeline Depth (number of stages)

Performance vs. Pipeline Depth ¨ Impact of stall cycles on performance ¤ Independent instructions ¤ Dependent instructions No Stalls Fully Stalled Average 1 𝑚𝑏𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧 Performance Increase overlap among instructions in the pipeline (Instruction Level Parallelism) Pipeline Depth (number of stages)

Instruction Level Parallelism ¨ Potential overlap among instructions ¤ A property of the program dataflow Code 1 Code 2 ADD R1, R2, R3 ADD R1, R2, R3 SUB R4, R1, R5 SUB R4, R6, R5 XOR R6, R4, R7 XOR R8, R2, R7 AND R8, R6, R9 AND R9, R6, R0

Instruction Level Parallelism ¨ Potential overlap among instructions ¤ A property of the program dataflow Code 1 Code 2 ADD R1, R2, R3 ADD R1, R2, R3 SUB R4, R1, R5 SUB R4, R6, R5 XOR R6, R4, R7 XOR R8, R2, R7 AND R8, R6, R9 AND R9, R6, R0 ILP = 1 ILP = 4 Fully serial Fully parallel

Instruction Level Parallelism ¨ Potential overlap among instructions ¤ A property of the program dataflow ¤ Influenced by compiler X ß A + B + C + D

Instruction Level Parallelism ¨ Potential overlap among instructions ¤ A property of the program dataflow ¤ Influenced by compiler X ß A + B + C + D Code 1: ADD R5, R1, R2 ADD R5, R5, R3 ADD R5, R5, R4

Instruction Level Parallelism ¨ Potential overlap among instructions ¤ A property of the program dataflow ¤ Influenced by compiler X ß A + B + C + D Code 1: Code 2: ADD R5, R1, R2 ADD R6, R1, R2 ADD R5, R5, R3 ADD R7, R3, R4 ADD R5, R5, R4 ADD R5, R6, R7

Instruction Level Parallelism ¨ Potential overlap among instructions ¤ A property of the program dataflow ¤ Influenced by compiler X ß A + B + C + D Code 1: Code 2: ADD R5, R1, R2 ADD R6, R1, R2 ADD R5, R5, R3 ADD R7, R3, R4 ADD R5, R5, R4 ADD R5, R6, R7 Average ILP = 3/3 = 1 Average ILP = 3/2 = 1.5 Five registers Seven registers

Instruction Level Parallelism ¨ Potential overlap among instructions ¤ A property of the program dataflow ¤ Influenced by compiler ¨ An upper limit for attainable IPC for a given code ¤ IPC represents exploited ILP ADD R5, R1, R2 ADD R6, R1, R2 ADD R5, R5, R3 ADD R7, R3, R4 ADD R5, R5, R4 ADD R5, R6, R7 Average ILP = 3/3 = 1 Average ILP = 3/2 = 1.5 Five registers Seven registers

Instruction Level Parallelism ¨ Potential overlap among instructions ¤ A property of the program dataflow ¤ Influenced by compiler ¨ An upper limit for attainable IPC for a given code ¤ IPC represents exploited ILP ¨ Can be exploited by HW-/SW-intensive techniques ¤ Dynamic scheduling in hardware ¤ Static scheduling in software (compiler)

INSTRUCTION LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant - PowerPoint PPT Presentation

INSTRUCTION LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Tonight: release HW2 (due 11:59PM, Sept. 18) n Note: late submission =

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Instruction-Level Parallelism (ILP) Fine-grained parallelism Obtained by: instruction

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

MLP yes! Definitions ILP no ! MLP ILP = Instruction Level = Memory Level Parallelism Work

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Chapter 2 Chapter 2 Instruction-Level Parallelism and Its Exploitation p 1 Overview

Exploitation of instruction level parallelism Computer Architecture J. Daniel Garca Snchez

Chapter 2 Instruction-Level Parallelism and Its E Exploitation l it ti 1 Overview

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

Dataflow Computers Motivation: exploit instruction-level parallelism on a massive scale

Chapter 3: Instruction Level Parallelism (ILP) and its exploitation Pipeline CPI = Ideal

SIMD Single Instruction Multiple Data Parallelism through simultaneous operations on different

Unit 8: Superscalar Pipelines Then: Static & dynamic scheduling Extract much more

Parallel Models Different ways to exploit parallelism Outline Shared-Variables Parallelism

CIS 371 Computer Organization and Design Unit 11: Static and Dynamic Scheduling Slides

SAM: Optimizing Multithreaded Cores for Speculative Parallelism MA MALEEN ABEYDEERA, SUVINAY

CS 152: Discussion Section 6 Out-of-Order Execution Albert Ou, Yue Dai 03/06/2020 Administrivia

Chunk-level Reordering of Source Language Sentences with Automatically Learned Rules for

SAM: Optimizing Multithreaded Cores for Speculative Parallelism MALEEN ABEYDEERA, SUVINAY

Lecture 17: More Fun With Sparse Matrices David Bindel 26 Oct 2011 Logistics Thanks for

Memory Accesses in Out-of-Order Execution Nima Honarmand Spring 2016 :: CSE 502 Computer

Tiling: A Data Locality Optimizing Algorithm Previously Kelly & Pugh transformation

Sambuz

Useful Links

Newsletter

Mail Us

INSTRUCTION LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant - PowerPoint PPT Presentation

INSTRUCTION LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Tonight: release HW2 (due 11:59PM, Sept. 18) n Note: late submission =

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Instruction-Level Parallelism (ILP) Fine-grained parallelism Obtained by: instruction

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

MLP yes! Definitions ILP no ! MLP ILP = Instruction Level = Memory Level Parallelism Work

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Chapter 2 Chapter 2 Instruction-Level Parallelism and Its Exploitation p 1 Overview

Exploitation of instruction level parallelism Computer Architecture J. Daniel Garca Snchez

Chapter 2 Instruction-Level Parallelism and Its E Exploitation l it ti 1 Overview

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

Dataflow Computers Motivation: exploit instruction-level parallelism on a massive scale

Chapter 3: Instruction Level Parallelism (ILP) and its exploitation Pipeline CPI = Ideal

SIMD Single Instruction Multiple Data Parallelism through simultaneous operations on different

Unit 8: Superscalar Pipelines Then: Static &amp; dynamic scheduling Extract much more

Parallel Models Different ways to exploit parallelism Outline Shared-Variables Parallelism

CIS 371 Computer Organization and Design Unit 11: Static and Dynamic Scheduling Slides

SAM: Optimizing Multithreaded Cores for Speculative Parallelism MA MALEEN ABEYDEERA, SUVINAY

CS 152: Discussion Section 6 Out-of-Order Execution Albert Ou, Yue Dai 03/06/2020 Administrivia

Chunk-level Reordering of Source Language Sentences with Automatically Learned Rules for

SAM: Optimizing Multithreaded Cores for Speculative Parallelism MALEEN ABEYDEERA, SUVINAY

Lecture 17: More Fun With Sparse Matrices David Bindel 26 Oct 2011 Logistics Thanks for

Memory Accesses in Out-of-Order Execution Nima Honarmand Spring 2016 :: CSE 502 Computer

Tiling: A Data Locality Optimizing Algorithm Previously Kelly &amp; Pugh transformation

Sambuz

Useful Links

Newsletter

Mail Us

Unit 8: Superscalar Pipelines Then: Static & dynamic scheduling Extract much more

Tiling: A Data Locality Optimizing Algorithm Previously Kelly & Pugh transformation