SLIDE 1
CSCI341 Lecture 37, Introduction to Parallelism PIPELINING - - PowerPoint PPT Presentation
CSCI341 Lecture 37, Introduction to Parallelism PIPELINING - - PowerPoint PPT Presentation
CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism among instructions. Instruction-level parallelism INSTRUCTION-LEVEL PARALLELISM Increase depth of pipeline (greater overlap of
SLIDE 2
SLIDE 3
INSTRUCTION-LEVEL PARALLELISM
- Increase depth of pipeline (greater overlap of instructions)
- Replicate hardware (handle more instructions simultaneously)
- aka “multiple issue”
SLIDE 4
MULTIPLE ISSUE
- Instruction execution can exceed clock rate
- CPI less than 1
SLIDE 5
EXAMPLE
- 16 billion instructions per second
- Ideal CPI of 0.25 (IPC of 4)
- In a five-stage pipeline, 20 instructions in progress at once
A 4GHz four-way multiple issue microprocessor... (modern CPUs approach 3 - 6 instructions per cycle)
SLIDE 6
2 IMPORTANT IMPLEMENTATIONS
- Compile-time (statically)
- During execution (dynamically)
SLIDE 7
CHALLENGES
- How does the CPU determine how many instructions (and
which instructions) can be issued?
- How do we deal with data/control hazards?
SLIDE 8
SPECULATION
- The compiler / CPU “guesses” about the properties of an
instructions.
- eg, branching, storing & loading
- Potential for bad guesses (changing the decision is complex)
- Buffering speculated instructions
- Buffering exceptions
SLIDE 9
STATIC MULTIPLE ISSUE SYSTEM
Heavy reliance on the compiler.
SLIDE 10
ISSUE PACKET
- Set of instructions issued in a given clock cycle
- Very Long Instruction Word (VLIW)
SLIDE 11
CONSIDER...
A two-issue MIPS processor.
- One instruction can be ALU operation or branch
- The other can be load/store
(lets call it “TIM”)
SLIDE 12
TIM
- How many bits of instructions per cycle?
- Instructions paired, aligned.
- ALU/branch instruction is “first.”
- If one member of the pair can’t be used, replace with nop.
SLIDE 13
TIM
Two instructions per stage at a time.
SLIDE 14
TIM HAZARDS
- Sometimes, it’s the compiler’s full responsibility
- Remove hazards by arranging/scheduling instructions
- Inserting NOPs where necessary, etc
SLIDE 15
TIM HAZARDS
- Sometimes, the hardware detects hazards between issue
packets
- Generates stalls
- Still relies on compiler to generate appropriate packets
SLIDE 16
TIM’S DATAPATH
SLIDE 17
TIM’S DATAPATH
32 more bits from instruction memory Two more read ‘ports’
- ne more write ‘port’
Extra ALU
SLIDE 18
NO MAGIC SPEED BOOST
Potential to double performance. Potential for hazards to impact two instructions.
SLIDE 19
USE LATENCY
that’s a noun
“Number of clock cycles between a load instruction and an instruction that can use the result of the load without stalling the pipeline.”
SLIDE 20
MIPS
use latency of one cycle
TIM
Potentially impacts two instructions.
SLIDE 21
TIM really needs to rely on the compiler.
SLIDE 22
EXAMPLE
Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $zero, Loop How might we schedule this for TIM?
SLIDE 23
EXAMPLE
ALU/branch ins. Data xfer ins. clock cycle Loop:
lw $t0, 0($s1)
1
addi $s1, $s1, -4
2
addu $t0, $t0, $s2
3
bne $s1, $zero, Loop sw $t0, 4($s1)
4
lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $zero, Loop
SLIDE 24
LOOP UNROLLING
(compiler technique) For loops that access arrays, make multiple copies of the loop body. Schedule instructions from different iterations together.
SLIDE 25
LOOP UNROLLING
Challenge: how does this work? (p396 - 398)
SLIDE 26
HOMEWORK
- Reading 31
- Continue Project 8