csci341
play

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING - PowerPoint PPT Presentation

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism among instructions. Instruction-level parallelism INSTRUCTION-LEVEL PARALLELISM Increase depth of pipeline (greater overlap of


  1. CSCI341 Lecture 37, Introduction to Parallelism

  2. PIPELINING “Exploits potential parallelism among instructions.” “Instruction-level parallelism”

  3. INSTRUCTION-LEVEL PARALLELISM • Increase depth of pipeline (greater overlap of instructions) • Replicate hardware (handle more instructions simultaneously) • aka “multiple issue”

  4. MULTIPLE ISSUE • Instruction execution can exceed clock rate • CPI less than 1

  5. EXAMPLE A 4GHz four-way multiple issue microprocessor... • 16 billion instructions per second • Ideal CPI of 0.25 (IPC of 4) • In a five-stage pipeline, 20 instructions in progress at once (modern CPUs approach 3 - 6 instructions per cycle)

  6. 2 IMPORTANT IMPLEMENTATIONS • Compile-time (statically) • During execution (dynamically)

  7. CHALLENGES • How does the CPU determine how many instructions (and which instructions) can be issued? • How do we deal with data/control hazards?

  8. SPECULATION • The compiler / CPU “guesses” about the properties of an instructions. • eg, branching, storing & loading • Potential for bad guesses (changing the decision is complex) • Buffering speculated instructions • Buffering exceptions

  9. STATIC MULTIPLE ISSUE SYSTEM Heavy reliance on the compiler.

  10. ISSUE PACKET • Set of instructions issued in a given clock cycle • Very Long Instruction Word (VLIW)

  11. CONSIDER... A two-issue MIPS processor. • One instruction can be ALU operation or branch • The other can be load/store (lets call it “TIM”)

  12. TIM • How many bits of instructions per cycle? • Instructions paired, aligned. • ALU/branch instruction is “first.” • If one member of the pair can’t be used, replace with nop.

  13. TIM Two instructions per stage at a time.

  14. TIM HAZARDS • Sometimes, it’s the compiler’s full responsibility • Remove hazards by arranging/scheduling instructions • Inserting NOPs where necessary, etc

  15. TIM HAZARDS • Sometimes, the hardware detects hazards between issue packets • Generates stalls • Still relies on compiler to generate appropriate packets

  16. TIM’S DATAPATH

  17. TIM’S DATAPATH 32 more bits from instruction memory Two more read ‘ports’ one more write ‘port’ Extra ALU

  18. NO MAGIC SPEED BOOST Potential to double performance. Potential for hazards to impact two instructions.

  19. that’s a noun USE LATENCY “Number of clock cycles between a load instruction and an instruction that can use the result of the load without stalling the pipeline.”

  20. MIPS use latency of one cycle TIM Potentially impacts two instructions.

  21. TIM really needs to rely on the compiler.

  22. EXAMPLE Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $zero, Loop How might we schedule this for TIM?

  23. lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) EXAMPLE addi $s1, $s1, -4 bne $s1, $zero, Loop ALU/branch ins. Data xfer ins. clock cycle Loop: 1 lw $t0, 0($s1) 2 addi $s1, $s1, -4 3 addu $t0, $t0, $s2 4 bne $s1, $zero, Loop sw $t0, 4($s1)

  24. LOOP UNROLLING (compiler technique) For loops that access arrays, make multiple copies of the loop body. Schedule instructions from different iterations together.

  25. LOOP UNROLLING Challenge: how does this work? (p396 - 398)

  26. HOMEWORK • Reading 31 • Continue Project 8 TIMmeh!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend