CSCI341 Lecture 37, Introduction to Parallelism PIPELINING - - PowerPoint PPT Presentation

csci341
SMART_READER_LITE
LIVE PREVIEW

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING - - PowerPoint PPT Presentation

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism among instructions. Instruction-level parallelism INSTRUCTION-LEVEL PARALLELISM Increase depth of pipeline (greater overlap of


slide-1
SLIDE 1

CSCI341

Lecture 37, Introduction to Parallelism

slide-2
SLIDE 2

PIPELINING

“Exploits potential parallelism among instructions.” “Instruction-level parallelism”

slide-3
SLIDE 3

INSTRUCTION-LEVEL PARALLELISM

  • Increase depth of pipeline (greater overlap of instructions)
  • Replicate hardware (handle more instructions simultaneously)
  • aka “multiple issue”
slide-4
SLIDE 4

MULTIPLE ISSUE

  • Instruction execution can exceed clock rate
  • CPI less than 1
slide-5
SLIDE 5

EXAMPLE

  • 16 billion instructions per second
  • Ideal CPI of 0.25 (IPC of 4)
  • In a five-stage pipeline, 20 instructions in progress at once

A 4GHz four-way multiple issue microprocessor... (modern CPUs approach 3 - 6 instructions per cycle)

slide-6
SLIDE 6

2 IMPORTANT IMPLEMENTATIONS

  • Compile-time (statically)
  • During execution (dynamically)
slide-7
SLIDE 7

CHALLENGES

  • How does the CPU determine how many instructions (and

which instructions) can be issued?

  • How do we deal with data/control hazards?
slide-8
SLIDE 8

SPECULATION

  • The compiler / CPU “guesses” about the properties of an

instructions.

  • eg, branching, storing & loading
  • Potential for bad guesses (changing the decision is complex)
  • Buffering speculated instructions
  • Buffering exceptions
slide-9
SLIDE 9

STATIC MULTIPLE ISSUE SYSTEM

Heavy reliance on the compiler.

slide-10
SLIDE 10

ISSUE PACKET

  • Set of instructions issued in a given clock cycle
  • Very Long Instruction Word (VLIW)
slide-11
SLIDE 11

CONSIDER...

A two-issue MIPS processor.

  • One instruction can be ALU operation or branch
  • The other can be load/store

(lets call it “TIM”)

slide-12
SLIDE 12

TIM

  • How many bits of instructions per cycle?
  • Instructions paired, aligned.
  • ALU/branch instruction is “first.”
  • If one member of the pair can’t be used, replace with nop.
slide-13
SLIDE 13

TIM

Two instructions per stage at a time.

slide-14
SLIDE 14

TIM HAZARDS

  • Sometimes, it’s the compiler’s full responsibility
  • Remove hazards by arranging/scheduling instructions
  • Inserting NOPs where necessary, etc
slide-15
SLIDE 15

TIM HAZARDS

  • Sometimes, the hardware detects hazards between issue

packets

  • Generates stalls
  • Still relies on compiler to generate appropriate packets
slide-16
SLIDE 16

TIM’S DATAPATH

slide-17
SLIDE 17

TIM’S DATAPATH

32 more bits from instruction memory Two more read ‘ports’

  • ne more write ‘port’

Extra ALU

slide-18
SLIDE 18

NO MAGIC SPEED BOOST

Potential to double performance. Potential for hazards to impact two instructions.

slide-19
SLIDE 19

USE LATENCY

that’s a noun

“Number of clock cycles between a load instruction and an instruction that can use the result of the load without stalling the pipeline.”

slide-20
SLIDE 20

MIPS

use latency of one cycle

TIM

Potentially impacts two instructions.

slide-21
SLIDE 21

TIM really needs to rely on the compiler.

slide-22
SLIDE 22

EXAMPLE

Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $zero, Loop How might we schedule this for TIM?

slide-23
SLIDE 23

EXAMPLE

ALU/branch ins. Data xfer ins. clock cycle Loop:

lw $t0, 0($s1)

1

addi $s1, $s1, -4

2

addu $t0, $t0, $s2

3

bne $s1, $zero, Loop sw $t0, 4($s1)

4

lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $zero, Loop

slide-24
SLIDE 24

LOOP UNROLLING

(compiler technique) For loops that access arrays, make multiple copies of the loop body. Schedule instructions from different iterations together.

slide-25
SLIDE 25

LOOP UNROLLING

Challenge: how does this work? (p396 - 398)

slide-26
SLIDE 26

HOMEWORK

  • Reading 31
  • Continue Project 8

TIMmeh!