INSTRUCTION LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant - - PowerPoint PPT Presentation

instruction level parallelism
SMART_READER_LITE
LIVE PREVIEW

INSTRUCTION LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant - - PowerPoint PPT Presentation

INSTRUCTION LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Tonight: release HW2 (due 11:59PM, Sept. 18) n Note: late submission =


slide-1
SLIDE 1

CS/ECE 6810: Computer Architecture

Mahdi Nazm Bojnordi

Assistant Professor School of Computing University of Utah

INSTRUCTION LEVEL PARALLELISM

slide-2
SLIDE 2

Overview

¨ Announcement

¤ Tonight: release HW2 (due 11:59PM, Sept. 18)

n Note: late submission = no submission n One of your lowest assignment scores will be dropped J ¨ This lecture

¤ Recap multicycle ¤ Impacts of data dependence ¤ Pipeline performance ¤ Instruction level parallelism

slide-3
SLIDE 3

Multicycle Instructions

¨ Data hazards ¤ more read-after-write hazards

load f4, 0(r2) mul f0, f4, f6 add f2, f0, f8 store f2, 0(r2)

slide-4
SLIDE 4

Multicycle Instructions

¨ Data hazards ¤ more read-after-write hazards

load f4, 0(r2) mul f0, f4, f6 add f2, f0, f8 store f2, 0(r2)

slide-5
SLIDE 5

Multicycle Instructions

¨ Data hazards ¤ more read-after-write hazards

load f4, 0(r2) mul f0, f4, f6 add f2, f0, f8 store f2, 0(r2)

IF ID EX MAWB IF ID EX MAWB IF ID A1 A2 A3 A4 MAWB IF ID M1 M2 M3 M4 M5 M6 M7 MAWB

slide-6
SLIDE 6

Multicycle Instructions

¨ Data hazards ¤ potential write-after-write hazards

load f4, 0(r2) mul f2, f4, f6 add f2, f0, f8 store f2, 0(r2)

slide-7
SLIDE 7

Multicycle Instructions

¨ Data hazards ¤ potential write-after-write hazards

load f4, 0(r2) mul f2, f4, f6 add f2, f0, f8 store f2, 0(r2)

slide-8
SLIDE 8

Multicycle Instructions

¨ Data hazards ¤ potential write-after-write hazards

load f4, 0(r2) mul f2, f4, f6 add f2, f0, f8 store f2, 0(r2)

IF ID EX MAWB IF ID EX MAWB IF ID A1 A2 A3 A4 MAWB IF ID M1 M2 M3 M4 M5 M6 M7 MAWB

slide-9
SLIDE 9

Multicycle Instructions

¨ Data hazards ¤ potential write-after-write hazards

load f4, 0(r2) mul f2, f4, f6 add f2, f0, f8 store f2, 0(r2)

IF ID EX MAWB IF ID EX MAWB IF ID A1 A2 A3 A4 MAWB IF ID M1 M2 M3 M4 M5 M6 M7 MAWB Out of Order Write-back!!

slide-10
SLIDE 10

Multicycle Instructions

¨ Data hazards ¤ potential write-after-write hazards

load f4, 0(r2) mul f2, f4, f6 add f2, f0, f8 store f2, 0(r2)

IF ID EX MAWB IF ID EX MAWB IF ID A1 A2 A3 A4 MAWB IF ID M1 M2 M3 M4 M5 M6 M7 MAWB In-Order Writes

slide-11
SLIDE 11

Multicycle Instructions

¨ Imprecise exception

¤ instructions do not necessarily complete in program

  • rder

load f4, 0(r2) mul f2, f4, f6 add f3, f0, f8 store f2, 0(r2)

slide-12
SLIDE 12

Multicycle Instructions

¨ Imprecise exception

¤ instructions do not necessarily complete in program

  • rder

load f4, 0(r2) mul f2, f4, f6 add f3, f0, f8 store f2, 0(r2)

IF ID EX MAWB IF ID EX MAWB IF ID A1 A2 A3 A4 MAWB IF ID M1 M2 M3 M4 M5 M6 M7 MAWB

slide-13
SLIDE 13

Multicycle Instructions

¨ Imprecise exception

¤ instructions do not necessarily complete in program

  • rder

load f4, 0(r2) mul f2, f4, f6 add f3, f0, f8 store f2, 0(r2)

IF ID EX MAWB IF ID EX MAWB IF ID A1 A2 A3 A4 MAWB IF ID M1 M2 M3 M4 M5 M6 M7 MAWB Overflow!!

slide-14
SLIDE 14

Multicycle Instructions

¨ Imprecise exception

¤ state of the processor must be kept updated with

respect to the program order

load f4, 0(r2) mul f2, f4, f6 add f3, f0, f8 store f2, 0(r2)

In-order register file updates IF ID EX MAWB IF ID EX MAWB IF ID A1 A2 A3 A4 MAWB IF ID M1 M2 M3 M4 M5 M6 M7 MAWB

slide-15
SLIDE 15

Reorder Buffer

¨ Multicycle Instructions

  • Ints. Dest.

mul f2, f4, f6 add f4, f0, f1 sub f6, f3, f7

slide-16
SLIDE 16

Reorder Buffer

¨ Multicycle Instructions

mul f2 add f4 sub f6

  • Ints. Dest.

mul f2, f4, f6 add f4, f0, f1 sub f6, f3, f7

slide-17
SLIDE 17

Data Dependence

¨ Point of production

¤ The pipeline stage where an instruction produces a

value that can be used by its following instructions

  • Ints. 1: producer

PoP time

slide-18
SLIDE 18

Data Dependence

¨ Point of production

¤ The pipeline stage where an instruction produces a

value that can be used by its following instructions

¨ Point of consumption

¤ The pipeline stage where an instruction consumes a

produced data

  • Ints. 1: producer
  • Inst. 2: consumer

PoP PoC time

slide-19
SLIDE 19

Problem

¨ Consider a 10-stage pipeline processor, where

point of production and point of consumption are separated by 4 cycles. Assume that half the instructions do not introduce a data hazard and half the instructions depend on their preceding

  • instruction. What is the maximum attainable IPC?
slide-20
SLIDE 20

Problem

¨ Consider a 10-stage pipeline processor, where

point of production and point of consumption are separated by 4 cycles. Assume that half the instructions do not introduce a data hazard and half the instructions depend on their preceding

  • instruction. What is the maximum attainable IPC?

… Instructions Stall Cycles

slide-21
SLIDE 21

Problem

¨ Consider a 10-stage pipeline processor, where

point of production and point of consumption are separated by 4 cycles. Assume that half the instructions do not introduce a data hazard and half the instructions depend on their preceding

  • instruction. What is the maximum attainable IPC?

… Instructions Stall Cycles

IPC = = 0.4 2 5

slide-22
SLIDE 22

Performance vs. Pipeline Depth

¨ Impact of stall cycles on performance

¤ Independent instructions ¤ Dependent instructions

Performance Pipeline Depth (number of stages)

No Stalls

slide-23
SLIDE 23

Performance vs. Pipeline Depth

¨ Impact of stall cycles on performance

¤ Independent instructions ¤ Dependent instructions 1 𝑚𝑏𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧

Performance Pipeline Depth (number of stages)

No Stalls

slide-24
SLIDE 24

Performance vs. Pipeline Depth

¨ Impact of stall cycles on performance

¤ Independent instructions ¤ Dependent instructions 1 𝑚𝑏𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧

Performance Pipeline Depth (number of stages)

No Stalls Fully Stalled

slide-25
SLIDE 25

Performance vs. Pipeline Depth

¨ Impact of stall cycles on performance

¤ Independent instructions ¤ Dependent instructions 1 𝑚𝑏𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧

Performance Pipeline Depth (number of stages)

No Stalls Fully Stalled Average

slide-26
SLIDE 26

Performance vs. Pipeline Depth

¨ Impact of stall cycles on performance

¤ Independent instructions ¤ Dependent instructions 1 𝑚𝑏𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧

Performance Pipeline Depth (number of stages)

No Stalls Fully Stalled Average

Increase overlap among instructions in the pipeline (Instruction Level Parallelism)

slide-27
SLIDE 27

Instruction Level Parallelism

¨ Potential overlap among instructions

¤ A property of the program dataflow

ADD R1, R2, R3 SUB R4, R1, R5 XOR R6, R4, R7 AND R8, R6, R9

Code 1

ADD R1, R2, R3 SUB R4, R6, R5 XOR R8, R2, R7 AND R9, R6, R0

Code 2

slide-28
SLIDE 28

Instruction Level Parallelism

¨ Potential overlap among instructions

¤ A property of the program dataflow

ADD R1, R2, R3 SUB R4, R1, R5 XOR R6, R4, R7 AND R8, R6, R9

Code 1

ADD R1, R2, R3 SUB R4, R6, R5 XOR R8, R2, R7 AND R9, R6, R0

Code 2 ILP = 1 Fully serial ILP = 4 Fully parallel

slide-29
SLIDE 29

Instruction Level Parallelism

¨ Potential overlap among instructions

¤ A property of the program dataflow ¤ Influenced by compiler

X ß A + B + C + D

slide-30
SLIDE 30

Instruction Level Parallelism

¨ Potential overlap among instructions

¤ A property of the program dataflow ¤ Influenced by compiler Code 1: ADD R5, R1, R2 ADD R5, R5, R3 ADD R5, R5, R4

X ß A + B + C + D

slide-31
SLIDE 31

Instruction Level Parallelism

¨ Potential overlap among instructions

¤ A property of the program dataflow ¤ Influenced by compiler Code 1: ADD R5, R1, R2 ADD R5, R5, R3 ADD R5, R5, R4 Code 2: ADD R6, R1, R2 ADD R7, R3, R4 ADD R5, R6, R7

X ß A + B + C + D

slide-32
SLIDE 32

Instruction Level Parallelism

¨ Potential overlap among instructions

¤ A property of the program dataflow ¤ Influenced by compiler Code 1: ADD R5, R1, R2 ADD R5, R5, R3 ADD R5, R5, R4 Code 2: ADD R6, R1, R2 ADD R7, R3, R4 ADD R5, R6, R7

X ß A + B + C + D

Average ILP = 3/3 = 1 Five registers Average ILP = 3/2 = 1.5 Seven registers

slide-33
SLIDE 33

Instruction Level Parallelism

¨ Potential overlap among instructions

¤ A property of the program dataflow ¤ Influenced by compiler

¨ An upper limit for attainable IPC for a given code

¤ IPC represents exploited ILP ADD R5, R1, R2 ADD R5, R5, R3 ADD R5, R5, R4 ADD R6, R1, R2 ADD R7, R3, R4 ADD R5, R6, R7 Average ILP = 3/3 = 1 Five registers Average ILP = 3/2 = 1.5 Seven registers

slide-34
SLIDE 34

Instruction Level Parallelism

¨ Potential overlap among instructions

¤ A property of the program dataflow ¤ Influenced by compiler

¨ An upper limit for attainable IPC for a given code

¤ IPC represents exploited ILP

¨ Can be exploited by HW-/SW-intensive techniques

¤ Dynamic scheduling in hardware ¤ Static scheduling in software (compiler)