Hazards 1 Today Quiz recap Quiz 2 correction Flash memory Data - - PowerPoint PPT Presentation

hazards
SMART_READER_LITE
LIVE PREVIEW

Hazards 1 Today Quiz recap Quiz 2 correction Flash memory Data - - PowerPoint PPT Presentation

Hazards 1 Today Quiz recap Quiz 2 correction Flash memory Data Hazards Watch for announcement about signing up for a project interview. Remember! There is a midterm next Tuesday that covers everything through this


slide-1
SLIDE 1

Hazards

1

slide-2
SLIDE 2

Today

  • Quiz recap
  • Quiz 2 correction
  • Flash memory
  • Data Hazards
  • Watch for announcement about signing up for a

project interview.

  • Remember! There is a midterm next Tuesday

that covers everything through this Thursday

  • Review session: Next Monday, 7pm cse4140

2

slide-3
SLIDE 3

Hazards: Key Points

  • Hazards cause imperfect pipelining
  • They prevent us from achieving CPI = 1
  • They are generally causes by “counter flow” data dependences in

the pipeline

  • Three kinds
  • Structural -- contention for hardware resources
  • Data -- a data value is not available when/where it is needed.
  • Control -- the next instruction to execute is not known.
  • Two ways to deal with hazards
  • Removal -- add hardware and/or complexity to work around the

hazard so it does not exist

  • Bypassing/forwarding
  • Speculation
  • Stall -- Sacrifice performance to prevent the hazard from
  • ccurring
  • Bubbles

3

slide-4
SLIDE 4

Data Dependences

  • A data dependence occurs whenever one

instruction needs a value produced by another.

  • Register values (for now)
  • Also memory accesses (more on this later)

4

add $s0, $t0, $t1 sub $t2, $s0, $t3 add $t3, $s0, $t4 and $t3, $t2, $t4

sw $t1, 0($t2) ld $t3, 0($t2) ld $t4, 16($s4)

slide-5
SLIDE 5
  • In our simple pipeline, these instructions cause a

hazard

  • Dependences in the pipeline

5

EX

Deco de Fetch Mem Write back

add $s0, $t0, $t1

EX

Deco de Fetch Mem Write back

sub $t2, $s0, $t3

Cycles

slide-6
SLIDE 6

How can we fix it?

  • Ideas?

6

slide-7
SLIDE 7

Solution 1: Make the compiler deal with it.

  • Expose hazards to the big A architecture
  • A result is available N instructions after the instruction

that generates it.

  • In the meantime, the register file has the old value.
  • “delay slots”
  • What is N?
  • Can it change?
  • What can the compiler do?

7 EX

Deco de Fetch Mem Write back

slide-8
SLIDE 8

Compiling for delay slots

8

add $s0, $t0, $t1 sub $t2, $s0, $t3 add $t3, $s0, $t4 and $t7, $t5, $t4 add $s0, $t0, $t1 and $t7, $t5, $t4 sub $t2, $s0, $t3 add $t3, $s0, $t4 Rearrange instructions

  • The compiler must fill the delay slots with other

instructions

  • What if it can’t?
  • No-ops
slide-9
SLIDE 9

Solution 2: Stall

  • When you need a value that is not ready, “stall”
  • Suspend the execution of the executing instruction
  • and those that follow.
  • This introduces a pipeline “bubble”

9

EX

Deco de Fetch Mem Write back

add $s0, $t0, $t1

Fetch

sub $t2, $s0, $t3

Cycles

EX

Deco de Mem Write back

Stall

slide-10
SLIDE 10

Stalling the pipeline

  • All pipeline stages preceding the stage where the

hazard occurs freeze

  • Disable the PC update
  • Disable the pipeline registers
  • This essentially equivalent to always inserting a

nop when a hazard exists

  • Insert nop control bits at stalled stage (decode in our

example)

  • How is this solution still potentially “better” than relying
  • n the compiler?

10

The compiler can still act like there are delay slots to avoid stalls. Implementation details are not exposed in the ISA

slide-11
SLIDE 11

The Impact of Stalling On Performance

  • ET = I * CPI * CT
  • I and CT are constant
  • What is the impact of stalling on CPI?
  • What do we need to know to figure it out?

11

slide-12
SLIDE 12

The Impact of Stalling On Performance

  • ET = I * CPI * CT
  • I and CT are constant
  • What is the impact of stalling on CPI?
  • Fraction of instructions that stall: 30%
  • Baseline CPI = 1
  • Stall CPI = 1 + 2 = 3
  • New CPI =

12

0.3*3 + 0.7*1 = 1.6

slide-13
SLIDE 13

Solution 3: Bypassing/Forwarding

  • Data values are computed in _____ and

_______but “publicized in write back”?

13

EX

Deco de Fetch Mem Write back results known Results "published" to registers inputs are needed

slide-14
SLIDE 14
  • Take the values, where ever they are
  • Bypassing or Forwarding

14

EX

Deco de Fetch Mem Write back

add $s0, $t0, $t1

EX

Deco de Fetch Mem Write back

sub $t2, $s0, $t3

Cycles

slide-15
SLIDE 15

Forwarding Paths

15

EX

Deco de Fetch Mem Write back

add $s0, $t0, $t1

EX

Deco de Fetch Mem Write back

sub $t2, $s0, $t3

Cycles

EX

Deco de Fetch Mem Write back

EX

Deco de Fetch Mem Write back

sub $t2, $s0, $t3 sub $t2, $s0, $t3

slide-16
SLIDE 16

Forwarding in Hardware

Read Address

Instruc(on Memory

Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr

Register File

Read Data 1 Read Data 2 16 32 ALU Shi< le< 2 Add

Data Memory

Address Write Data Read Data IFetch/Dec Dec/Exec Exec/Mem Mem/WB Sign Extend

slide-17
SLIDE 17
  • The forwarding unit detects instances when the

destination and source registers of executing instructions match

  • Set the control lines on the ALU input muxes

accordingly

  • Stall if, for some reason, forwarding is not possible.

17

Forwarding Control

slide-18
SLIDE 18

Forwarding for Loads

  • Load values come from the Mem stage

18

EX

Deco de Fetch Mem Write back

ld $s0, (0)$t0

EX

Deco de Fetch Mem

sub $t2, $s0, $t3

Cycles

Time travel presents significant implementation challenges

slide-19
SLIDE 19

What can we do?

  • Punt to the compiler
  • Complete solution.
  • Same dangers apply as before.
  • Always stall.
  • Forward when possible, stall otherwise
  • Here the compiler still has leverage
  • Code will be faster if the compiler generates code as if

there is a delay slot.

  • If the compiler can’t fix it, the hardware will stall

19

slide-20
SLIDE 20

Performance cost of stalling

  • ET = I * CPI * CT
  • CPI = %Stall * StallTime
  • % Stall is determined by how aggressive our

bypassing is and the quality of our compiler.

  • Stall time is related to pipeline depth. In our

case, it is 1 or 2, because our pipeline is shallow.

  • In deeper pipelines, it can larger.

20

slide-21
SLIDE 21

Hardware Cost of Forwarding

  • In our pipeline, adding forwarding required

relatively little hardware.

  • For deeper pipelines it gets much more

expensive

  • Roughly: ALU * pipeline stages you need to forward over
  • Some modern processor have multiple ALUs (4-5)
  • And deeper pipelines (4-5 stages of to forward across)

21